You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Cheolsoo Park (Created) (JIRA)" <ji...@apache.org> on 2012/03/02 19:45:57 UTC

[jira] [Created] (SQOOP-451) Add format string to select query for Oracle DB

Add format string to select query for Oracle DB
-----------------------------------------------

                 Key: SQOOP-451
                 URL: https://issues.apache.org/jira/browse/SQOOP-451
             Project: Sqoop
          Issue Type: Improvement
            Reporter: Cheolsoo Park
            Assignee: Cheolsoo Park
            Priority: Minor


Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.

This issue was originally raised by SQOOP-449, and using regular expressions in compatibility tests was suggested. But after discussion with Arvind and Bilung, we decided to add a format string for the date type to the select query for the Oracle DB so that the output format can be consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-451) Add format string to select query for Oracle DB

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224867#comment-13224867 ] 

jiraposter@reviews.apache.org commented on SQOOP-451:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4235/
-----------------------------------------------------------

Review request for Sqoop, Arvind Prabhakar and Bilung Lee.


Summary
-------

Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.

This patch is not going to be submitted unless a new option is added to Sqoop so that new behaviors happen only if that option is enabled. I am posting my patch only to collect feedback.


This addresses bug SQOOP-451.
    https://issues.apache.org/jira/browse/SQOOP-451


Diffs
-----

  ./src/java/com/cloudera/sqoop/mapreduce/db/DBConfiguration.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/DBRecordReader.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBRecordReader.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/OracleDBRecordReader.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/OracleDataDrivenDBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/manager/OracleManager.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/MySQLDumpImportJob.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/MySQLExportJob.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DBConfiguration.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DBInputFormat.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/OracleDBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/OracleDataDrivenDBInputFormat.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/OracleDataDrivenDBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/orm/ClassWriter.java 1297783 
  ./src/test/com/cloudera/sqoop/manager/OracleCompatTest.java 1297783 
  ./src/test/com/cloudera/sqoop/manager/OracleManagerTest.java 1297783 
  ./src/test/com/cloudera/sqoop/mapreduce/db/TestDataDrivenDBInputFormat.java 1297783 

Diff: https://reviews.apache.org/r/4235/diff


Testing
-------

ant test
ant test -Dthirdparty=true

Note that all the Oracle-specific methods in OracleCompatTest are removed since the same methods in ManagerCompatTest can be used as any other DBs.


Thanks,

Cheolsoo


                
> Add format string to select query for Oracle DB
> -----------------------------------------------
>
>                 Key: SQOOP-451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-451
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>
> Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.
> This issue was originally raised by SQOOP-449, and using regular expressions in compatibility tests was suggested. But after discussion with Arvind and Bilung, we decided to add a format string for the date type to the select query for the Oracle DB so that the output format can be consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-451) Add format string to select query for Oracle DB

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222153#comment-13222153 ] 

jiraposter@reviews.apache.org commented on SQOOP-451:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4172/
-----------------------------------------------------------

Review request for Sqoop and Bilung Lee.


Summary
-------

Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.

The changes include:
1) Add format mask to the select query.
2) Update OracleCompatTest accordingly.

Note that date and timestamp are not distinguishable by the Oracle JDBC driver, so the same format is applied to date output as timestamp. Therefore, the testDate methods in OracleCompatTest expect timestamp output.


This addresses bug SQOOP-451.
    https://issues.apache.org/jira/browse/SQOOP-451


Diffs
-----

  ./src/java/org/apache/sqoop/manager/OracleManager.java 1296761 
  ./src/test/com/cloudera/sqoop/manager/OracleCompatTest.java 1296761 

Diff: https://reviews.apache.org/r/4172/diff


Testing
-------

ant test
ant test -Dthirdparty=true


Thanks,

Cheolsoo


                
> Add format string to select query for Oracle DB
> -----------------------------------------------
>
>                 Key: SQOOP-451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-451
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>
> Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.
> This issue was originally raised by SQOOP-449, and using regular expressions in compatibility tests was suggested. But after discussion with Arvind and Bilung, we decided to add a format string for the date type to the select query for the Oracle DB so that the output format can be consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-451) Add new options for format masks for date, time, and timestamp

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231751#comment-13231751 ] 

jiraposter@reviews.apache.org commented on SQOOP-451:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4327/
-----------------------------------------------------------

Review request for Sqoop.


Summary
-------

Add new options via which the user can specify format masks for date, time, and timestamp columns:

--date-mask
--time-mask
--timestamp-mask

To manipulate text from/to the DB, I am using SimpleDateFormat.

The changes include:

1) Add format mask options as Sqoop common options.
2) Update ClassWriter so that SimpleDateFormat format() call can be generated in the toString() method.
3) Update ClassWriter so that SimpleDateFormat parse() call can be generated in the __loadFromFields() method.
4) Add new tests for import format to ManagerCompatTest and its subclasses.
5) Add new tests for export parse to TestExport and its subclasses.
6) Introduce regular expressions into OracleExportTest to get rid of try-catch blocks.
   (The format mask options do not format direct output from JDBC drivers.)
7) Fix a minor bug in MySQLCompatTest regarding discarded fractional seconds.


This addresses bug SQOOP-451.
    https://issues.apache.org/jira/browse/SQOOP-451


Diffs
-----

  ./src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1301119 
  ./src/java/org/apache/sqoop/SqoopOptions.java 1301119 
  ./src/java/org/apache/sqoop/orm/ClassWriter.java 1301119 
  ./src/java/org/apache/sqoop/tool/BaseSqoopTool.java 1301119 
  ./src/test/com/cloudera/sqoop/TestExport.java 1301119 
  ./src/test/com/cloudera/sqoop/manager/DirectMySQLExportTest.java 1301119 
  ./src/test/com/cloudera/sqoop/manager/JdbcMySQLExportTest.java 1301119 
  ./src/test/com/cloudera/sqoop/manager/MySQLCompatTest.java 1301119 
  ./src/test/com/cloudera/sqoop/manager/OracleCompatTest.java 1301119 
  ./src/test/com/cloudera/sqoop/manager/OracleExportTest.java 1301119 
  ./src/test/com/cloudera/sqoop/testutil/ImportJobTestCase.java 1301119 
  ./src/test/com/cloudera/sqoop/testutil/ManagerCompatTestCase.java 1301119 

Diff: https://reviews.apache.org/r/4327/diff


Testing
-------

- Various format mask tests for import jobs are added to ManagerCompatTest.
- Various format mask tests for export jobs are added to TestExport.
- Ran ant test, ant test -Dthirdparty=true, and ant checkstyle.


Thanks,

Cheolsoo


                
> Add new options for format masks for date, time, and timestamp
> --------------------------------------------------------------
>
>                 Key: SQOOP-451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-451
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>
> Add new options via which the user can specify format masks for date, time, and timestamp columns.
> The propose is to add pattern matching code to the toString() method of SqoopRecord so that when the output is written to files, it can be modified accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (SQOOP-451) Add new options for format masks for date, time, and timestamp

Posted by "Cheolsoo Park (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated SQOOP-451:
--------------------------------

    Description: 
Add new options via which the user can specify format masks for date, time, and timestamp columns.

The propose is to add pattern matching code to the toString() method of SqoopRecord so that when the output is written to files, it can be modified accordingly.


  was:
Add new options via which the user can specify format masks for date, time, and timestamp columns.

The propose is to add pattern matching code to the toString() method of SqoopRecorder so that when the output is written to files, it can be modified accordingly.


    
> Add new options for format masks for date, time, and timestamp
> --------------------------------------------------------------
>
>                 Key: SQOOP-451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-451
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>
> Add new options via which the user can specify format masks for date, time, and timestamp columns.
> The propose is to add pattern matching code to the toString() method of SqoopRecord so that when the output is written to files, it can be modified accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (SQOOP-451) Add new options for format masks for date, time, and timestamp

Posted by "Cheolsoo Park (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated SQOOP-451:
--------------------------------

    Description: 
Add new options via which the user can specify format masks for date, time, and timestamp columns.

The propose is to add pattern matching code to the toString() method of SqoopRecorder so that when the output is written to files, it can be modified accordingly.


  was:
Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.

This issue was originally raised by SQOOP-449, and using regular expressions in compatibility tests was suggested. But after discussion with Arvind and Bilung, we decided to add a format string for the date type to the select query for the Oracle DB so that the output format can be consistent.

        Summary: Add new options for format masks for date, time, and timestamp  (was: Add format string to select query for Oracle DB)
    
> Add new options for format masks for date, time, and timestamp
> --------------------------------------------------------------
>
>                 Key: SQOOP-451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-451
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>
> Add new options via which the user can specify format masks for date, time, and timestamp columns.
> The propose is to add pattern matching code to the toString() method of SqoopRecorder so that when the output is written to files, it can be modified accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SQOOP-451) Add format string to select query for Oracle DB

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SQOOP-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224927#comment-13224927 ] 

jiraposter@reviews.apache.org commented on SQOOP-451:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4235/
-----------------------------------------------------------

(Updated 2012-03-08 01:52:58.860527)


Review request for Sqoop, Arvind Prabhakar and Bilung Lee.


Changes
-------

Move fixupColumnTypes() from ClassWriter to OracleManager (i.e. no longer changing ClassWriter). It seems to fit better in OracleManager since that is Oracle-specific.


Summary
-------

Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.

This patch is not going to be submitted unless a new option is added to Sqoop so that new behaviors happen only if that option is enabled. I am posting my patch only to collect feedback.


This addresses bug SQOOP-451.
    https://issues.apache.org/jira/browse/SQOOP-451


Diffs (updated)
-----

  ./src/java/com/cloudera/sqoop/mapreduce/db/DBConfiguration.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/DBRecordReader.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBRecordReader.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/OracleDBRecordReader.java 1297783 
  ./src/java/com/cloudera/sqoop/mapreduce/db/OracleDataDrivenDBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/manager/OracleManager.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/MySQLDumpImportJob.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/MySQLExportJob.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DBConfiguration.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DBInputFormat.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/OracleDBRecordReader.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/OracleDataDrivenDBInputFormat.java 1297783 
  ./src/java/org/apache/sqoop/mapreduce/db/OracleDataDrivenDBRecordReader.java 1297783 
  ./src/test/com/cloudera/sqoop/manager/OracleCompatTest.java 1297783 
  ./src/test/com/cloudera/sqoop/manager/OracleManagerTest.java 1297783 
  ./src/test/com/cloudera/sqoop/mapreduce/db/TestDataDrivenDBInputFormat.java 1297783 

Diff: https://reviews.apache.org/r/4235/diff


Testing
-------

ant test
ant test -Dthirdparty=true

Note that all the Oracle-specific methods in OracleCompatTest are removed since the same methods in ManagerCompatTest can be used as any other DBs.


Thanks,

Cheolsoo


                
> Add format string to select query for Oracle DB
> -----------------------------------------------
>
>                 Key: SQOOP-451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-451
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>
> Oracle compatibility tests are fragile since the output format of timestamp from the DB varies depending on versions. To test different versions effectively, we should make the output more deterministic.
> This issue was originally raised by SQOOP-449, and using regular expressions in compatibility tests was suggested. But after discussion with Arvind and Bilung, we decided to add a format string for the date type to the select query for the Oracle DB so that the output format can be consistent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (SQOOP-451) Add new options for format masks for date, time, and timestamp

Posted by "Cheolsoo Park (Work started) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SQOOP-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on SQOOP-451 started by Cheolsoo Park.

> Add new options for format masks for date, time, and timestamp
> --------------------------------------------------------------
>
>                 Key: SQOOP-451
>                 URL: https://issues.apache.org/jira/browse/SQOOP-451
>             Project: Sqoop
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>
> Add new options via which the user can specify format masks for date, time, and timestamp columns.
> The propose is to add pattern matching code to the toString() method of SqoopRecord so that when the output is written to files, it can be modified accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira