You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Kathleen Ting (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 23:27:24 UTC
[jira] [Created] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Split-by specification incorrectly triggers bounding value query
----------------------------------------------------------------
Key: SQOOP-474
URL: https://issues.apache.org/jira/browse/SQOOP-474
Project: Sqoop
Issue Type: Bug
Components: build, connectors/generic
Affects Versions: 1.4.2-incubating
Reporter: Kathleen Ting
Assignee: Kathleen Ting
To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
{code}
This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
{code}
The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247764#comment-13247764 ]
jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/
-----------------------------------------------------------
(Updated 2012-04-05 22:14:27.818185)
Review request for Sqoop and Bilung Lee.
Summary
-------
Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
This addresses bug SQOOP-474.
https://issues.apache.org/jira/browse/SQOOP-474
Diffs
-----
./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506
Diff: https://reviews.apache.org/r/4614/diff
Testing
-------
Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
Thanks,
Kathleen
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kathleen Ting updated SQOOP-474:
--------------------------------
Affects Version/s: (was: 1.4.2-incubating)
1.4.1-incubating
Fix Version/s: 1.4.2-incubating
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.1-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Fix For: 1.4.2-incubating
>
> Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kathleen Ting updated SQOOP-474:
--------------------------------
Attachment: SQOOP-474-1.patch
Rebased on SQOOP-468.
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kathleen Ting updated SQOOP-474:
--------------------------------
Description:
To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
{code}
This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
{code}
An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
was:
To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
{code}
This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
{code}
The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247935#comment-13247935 ]
Hudson commented on SQOOP-474:
------------------------------
Integrated in Sqoop-ant-jdk-1.6 #107 (See [https://builds.apache.org/job/Sqoop-ant-jdk-1.6/107/])
SQOOP-474 Split-by specification incorrectly triggers bounding value query (Revision 1310129)
Result = SUCCESS
blee :
Files :
* /sqoop/trunk/src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.1-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Fix For: 1.4.2-incubating
>
> Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244755#comment-13244755 ]
jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/
-----------------------------------------------------------
Review request for Sqoop and Arvind Prabhakar.
Summary
-------
Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
This addresses bug SQOOP-474.
https://issues.apache.org/jira/browse/SQOOP-474
Diffs
-----
./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530
Diff: https://reviews.apache.org/r/4614/diff
Testing
-------
Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
Thanks,
Kathleen
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247754#comment-13247754 ]
jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------
bq. On 2012-04-03 20:53:06, Cheolsoo Park wrote:
bq. > Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468, which Jarec is going to submit soon. So you will need to rebase your patch once it is committed.
bq. >
bq. > In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line:
bq. >
bq. > private String buildBoundaryQuery(String col, String query) {
bq. > if (col == null) { // change to --> if (col == null || options.getNumMappers() == 1) {
bq. > return "";
bq. > }
bq. > ...
bq. > }
bq. >
bq. > I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions.
Thanks Cheolsoo. I've rebased SQOOP-474 on your SQOOP-468.
- Kathleen
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/#review6664
-----------------------------------------------------------
On 2012-04-02 22:23:54, Kathleen Ting wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4614/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-02 22:23:54)
bq.
bq.
bq. Review request for Sqoop and Arvind Prabhakar.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
bq.
bq.
bq. This addresses bug SQOOP-474.
bq. https://issues.apache.org/jira/browse/SQOOP-474
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530
bq.
bq. Diff: https://reviews.apache.org/r/4614/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
bq.
bq.
bq. Thanks,
bq.
bq. Kathleen
bq.
bq.
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kathleen Ting updated SQOOP-474:
--------------------------------
Attachment: SQOOP-474.patch
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245723#comment-13245723 ]
jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/#review6664
-----------------------------------------------------------
Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468, which Jarec is going to submit soon. So you will need to rebase your patch once it is committed.
In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line:
private String buildBoundaryQuery(String col, String query) {
if (col == null) { // change to --> if (col == null || options.getNumMappers() == 1) {
return "";
}
...
}
I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions.
- Cheolsoo
On 2012-04-02 22:23:54, Kathleen Ting wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4614/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-02 22:23:54)
bq.
bq.
bq. Review request for Sqoop and Arvind Prabhakar.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
bq.
bq.
bq. This addresses bug SQOOP-474.
bq. https://issues.apache.org/jira/browse/SQOOP-474
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530
bq.
bq. Diff: https://reviews.apache.org/r/4614/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
bq.
bq.
bq. Thanks,
bq.
bq. Kathleen
bq.
bq.
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247760#comment-13247760 ]
jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/
-----------------------------------------------------------
(Updated 2012-04-05 22:14:12.420696)
Review request for Sqoop and Arvind Prabhakar.
Changes
-------
Rebased on SQOOP-468
Summary
-------
Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
This addresses bug SQOOP-474.
https://issues.apache.org/jira/browse/SQOOP-474
Diffs (updated)
-----
./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506
Diff: https://reviews.apache.org/r/4614/diff
Testing
-------
Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
Thanks,
Kathleen
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SQOOP-474) Split-by specification incorrectly
triggers bounding value query
Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247780#comment-13247780 ]
jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/#review6721
-----------------------------------------------------------
Ship it!
- Bilung
On 2012-04-05 22:14:27, Kathleen Ting wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4614/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-04-05 22:14:27)
bq.
bq.
bq. Review request for Sqoop and Bilung Lee.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
bq.
bq.
bq. This addresses bug SQOOP-474.
bq. https://issues.apache.org/jira/browse/SQOOP-474
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506
bq.
bq. Diff: https://reviews.apache.org/r/4614/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
bq.
bq.
bq. Thanks,
bq.
bq. Kathleen
bq.
bq.
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
> Key: SQOOP-474
> URL: https://issues.apache.org/jira/browse/SQOOP-474
> Project: Sqoop
> Issue Type: Bug
> Components: build, connectors/generic
> Affects Versions: 1.4.2-incubating
> Reporter: Kathleen Ting
> Assignee: Kathleen Ting
> Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira