You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Kathleen Ting (Created) (JIRA)" <ji...@apache.org> on 2012/04/02 23:27:24 UTC

[jira] [Created] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Split-by specification incorrectly triggers bounding value query
----------------------------------------------------------------

                 Key: SQOOP-474
                 URL: https://issues.apache.org/jira/browse/SQOOP-474
             Project: Sqoop
          Issue Type: Bug
          Components: build, connectors/generic
    Affects Versions: 1.4.2-incubating
            Reporter: Kathleen Ting
            Assignee: Kathleen Ting


To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
{code}

This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
{code}

The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247764#comment-13247764 ] 

jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/
-----------------------------------------------------------

(Updated 2012-04-05 22:14:27.818185)


Review request for Sqoop and Bilung Lee.


Summary
-------

Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.


This addresses bug SQOOP-474.
    https://issues.apache.org/jira/browse/SQOOP-474


Diffs
-----

  ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506 

Diff: https://reviews.apache.org/r/4614/diff


Testing
-------

Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).


Thanks,

Kathleen


                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathleen Ting updated SQOOP-474:
--------------------------------

    Affects Version/s:     (was: 1.4.2-incubating)
                       1.4.1-incubating
        Fix Version/s: 1.4.2-incubating
    
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.1-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>             Fix For: 1.4.2-incubating
>
>         Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathleen Ting updated SQOOP-474:
--------------------------------

    Attachment: SQOOP-474-1.patch

Rebased on SQOOP-468.
                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathleen Ting updated SQOOP-474:
--------------------------------

    Description: 
To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
{code}

This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
{code}

An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

  was:
To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
{code}
$ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
{code}

This import will output the following:
{code}
12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
{code}

The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

    
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247935#comment-13247935 ] 

Hudson commented on SQOOP-474:
------------------------------

Integrated in Sqoop-ant-jdk-1.6 #107 (See [https://builds.apache.org/job/Sqoop-ant-jdk-1.6/107/])
    SQOOP-474 Split-by specification incorrectly triggers bounding value query (Revision 1310129)

     Result = SUCCESS
blee : 
Files : 
* /sqoop/trunk/src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java

                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.1-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>             Fix For: 1.4.2-incubating
>
>         Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244755#comment-13244755 ] 

jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/
-----------------------------------------------------------

Review request for Sqoop and Arvind Prabhakar.


Summary
-------

Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.


This addresses bug SQOOP-474.
    https://issues.apache.org/jira/browse/SQOOP-474


Diffs
-----

  ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530 

Diff: https://reviews.apache.org/r/4614/diff


Testing
-------

Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).


Thanks,

Kathleen


                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247754#comment-13247754 ] 

jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------



bq.  On 2012-04-03 20:53:06, Cheolsoo Park wrote:
bq.  > Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468, which Jarec is going to submit soon. So you will need to rebase your patch once it is committed.
bq.  > 
bq.  > In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line:
bq.  > 
bq.  > private String buildBoundaryQuery(String col, String query) {
bq.  >     if (col == null) {  // change to --> if (col == null || options.getNumMappers() == 1) {
bq.  >       return "";
bq.  >     }
bq.  >     ...
bq.  > }
bq.  > 
bq.  > I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions.

Thanks Cheolsoo. I've rebased SQOOP-474 on your SQOOP-468.


- Kathleen


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/#review6664
-----------------------------------------------------------


On 2012-04-02 22:23:54, Kathleen Ting wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4614/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-02 22:23:54)
bq.  
bq.  
bq.  Review request for Sqoop and Arvind Prabhakar.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
bq.  
bq.  
bq.  This addresses bug SQOOP-474.
bq.      https://issues.apache.org/jira/browse/SQOOP-474
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530 
bq.  
bq.  Diff: https://reviews.apache.org/r/4614/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kathleen
bq.  
bq.


                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "Kathleen Ting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathleen Ting updated SQOOP-474:
--------------------------------

    Attachment: SQOOP-474.patch
    
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245723#comment-13245723 ] 

jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/#review6664
-----------------------------------------------------------


Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468, which Jarec is going to submit soon. So you will need to rebase your patch once it is committed.

In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line:

private String buildBoundaryQuery(String col, String query) {
    if (col == null) {  // change to --> if (col == null || options.getNumMappers() == 1) {
      return "";
    }
    ...
}

I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions.

- Cheolsoo


On 2012-04-02 22:23:54, Kathleen Ting wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4614/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-02 22:23:54)
bq.  
bq.  
bq.  Review request for Sqoop and Arvind Prabhakar.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
bq.  
bq.  
bq.  This addresses bug SQOOP-474.
bq.      https://issues.apache.org/jira/browse/SQOOP-474
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530 
bq.  
bq.  Diff: https://reviews.apache.org/r/4614/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kathleen
bq.  
bq.


                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247760#comment-13247760 ] 

jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/
-----------------------------------------------------------

(Updated 2012-04-05 22:14:12.420696)


Review request for Sqoop and Arvind Prabhakar.


Changes
-------

Rebased on SQOOP-468


Summary
-------

Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.


This addresses bug SQOOP-474.
    https://issues.apache.org/jira/browse/SQOOP-474


Diffs (updated)
-----

  ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506 

Diff: https://reviews.apache.org/r/4614/diff


Testing
-------

Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).


Thanks,

Kathleen


                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-474) Split-by specification incorrectly triggers bounding value query

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247780#comment-13247780 ] 

jiraposter@reviews.apache.org commented on SQOOP-474:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4614/#review6721
-----------------------------------------------------------

Ship it!


- Bilung


On 2012-04-05 22:14:27, Kathleen Ting wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4614/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-05 22:14:27)
bq.  
bq.  
bq.  Review request for Sqoop and Bilung Lee.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.
bq.  
bq.  
bq.  This addresses bug SQOOP-474.
bq.      https://issues.apache.org/jira/browse/SQOOP-474
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506 
bq.  
bq.  Diff: https://reviews.apache.org/r/4614/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kathleen
bq.  
bq.


                
> Split-by specification incorrectly triggers bounding value query
> ----------------------------------------------------------------
>
>                 Key: SQOOP-474
>                 URL: https://issues.apache.org/jira/browse/SQOOP-474
>             Project: Sqoop
>          Issue Type: Bug
>          Components: build, connectors/generic
>    Affects Versions: 1.4.2-incubating
>            Reporter: Kathleen Ting
>            Assignee: Kathleen Ting
>         Attachments: SQOOP-474-1.patch, SQOOP-474.patch
>
>
> To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
> {code}
> $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
> {code}
> This import will output the following:
> {code}
> 12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
> {code}
> An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira