You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Attila Szabo <as...@cloudera.com> on 2016/07/18 19:19:22 UTC

Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/
-----------------------------------------------------------

Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.


Repository: sqoop-trunk


Description
-------

Proposed changes for SQOOP-2983


Diffs
-----

  src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 82e4266 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 

Diff: https://reviews.apache.org/r/50155/diff/


Testing
-------

800 columns with table
100.000 lines (156mb data)
1.000.000 lines (1.56 GB data)
3.000.000 lines (4.5 GB data)


Thanks,

Attila Szabo


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by Erzsebet Szilagyi <li...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review144099
-----------------------------------------------------------


Ship it!




Ship It!

- Erzsebet Szilagyi


On July 29, 2016, 1:17 a.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated July 29, 2016, 1:17 a.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatBase.java 8f94cf8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatUpdate.java a33768f 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
>   src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by David Robson <da...@quest.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review151139
-----------------------------------------------------------



This seems to work now - of course now we have the problem that there is two different code paths that do the same thing a different way so this is not really ideal. I guess it might be best to fix up the quoting issues first before attempting to fix the rest as at the moment it is quite hard to even test other changes as every second thing I try is broken and I am never totally sure if it's broken by new changes or it is just broken on trunk.

- David Robson


On Sept. 6, 2016, 11:06 p.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2016, 11:06 p.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
>   src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/util/OracleData.java 8846f65 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by Attila Szabo <as...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/
-----------------------------------------------------------

(Updated Sept. 6, 2016, 11:06 p.m.)


Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.


Changes
-------

Dear David,

I tend to agree with you that this review does not head into the right direction, thus regardless your first request I suggest omitting the update related performance fix from this version, and I suggest to handle it as part of a very different JIRA task.

Please review this changes as it is (it's just an earlier version of the code + the more robust test cases).

Many thanks,
Attila


Repository: sqoop-trunk


Description
-------

Proposed changes for SQOOP-2983


Diffs (updated)
-----

  src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
  src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
  src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
  src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
  src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/util/OracleData.java 8846f65 

Diff: https://reviews.apache.org/r/50155/diff/


Testing
-------

800 columns with table
100.000 lines (156mb data)
1.000.000 lines (1.56 GB data)
3.000.000 lines (4.5 GB data)


Thanks,

Attila Szabo


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by David Robson <da...@quest.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review146489
-----------------------------------------------------------



The tests now work for me. This patch definitely makes the quoting issue a lot worse than it currently is on trunk. A lot of tables that used to work now can no longer be exported to at all.
I guess just to get this patch out of the way we could commit it and raise the quoting issue as a separate issue?
This patch also currently breaks update exports completely - I tried various combinations and could not get any scenarios to work that work on trunk. I believe it is again related to the quoting issue as the following is put in the output:

"records were skipped due to a NULL value within one of the update-key column(s)."

Which is to do with OraOopOutputFormatUpdate line 390 I believe:

Object updateKeyValue = fieldMap.get(updateColumnName);

But it works if I run the same test on trunk but not with this patch.

- David Robson


On Aug. 19, 2016, 3:50 a.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2016, 3:50 a.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopConnManager.java 216c771 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatBase.java 8f94cf8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatUpdate.java a33768f 
>   src/java/org/apache/sqoop/manager/oracle/OraOopUtilities.java e81588c 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
>   src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OraOopUpdateKeyTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/util/OracleData.java 8846f65 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by Attila Szabo <as...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/
-----------------------------------------------------------

(Updated Aug. 19, 2016, 3:50 a.m.)


Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.


Changes
-------

Hi David,

It was a bit tricky to figure out the root cause/nature of both of the problems you ran into, but IMHO these were only configuration related issues on your side.

First of all: on the OraOopTypesTest it seemed to me there was a difference between your and my DB settings in how unicode characters are stored (in my system 16bits but seems from the error message on your system 32bits for the non standard characters), so that was the issue what cause the "value too large for column" problem as in your system 47 varchar2 was needed although on my system 29 was enough. (I've also done a recalculation manually and it gave the very same values). So I've raised the size of those fields to 255, thus we won't face the same issue in the future.

The qouting part: it's working fine, and it's more than necessary (it should had been fixed that time when the whole quoting fix had been introduced to Sqoop trunk), however the source of the problem was that your table was already created in your DB with different case column names (in the current sitution with "PRODUCT_ID" upper cased colum name), and the testcase did not drop/recreate the test table. So I've modified the testcase here as well, and also checked your scenario manually with the very same steps what you've made, it should work with the current version of the changes.

These two were very good findings, because pointed out my new testcases could have been ended in flakey tests, which we definitely would like to avoid, so I'm more then grateful for these two last findings (helped me to make not just the "working code" but the test cases more robust also).

I kindly ask you to review the latest changes again (with the new test cases).


Repository: sqoop-trunk


Description
-------

Proposed changes for SQOOP-2983


Diffs (updated)
-----

  src/java/org/apache/sqoop/manager/oracle/OraOopConnManager.java 216c771 
  src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatBase.java 8f94cf8 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatUpdate.java a33768f 
  src/java/org/apache/sqoop/manager/oracle/OraOopUtilities.java e81588c 
  src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
  src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
  src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
  src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OraOopUpdateKeyTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/util/OracleData.java 8846f65 

Diff: https://reviews.apache.org/r/50155/diff/


Testing
-------

800 columns with table
100.000 lines (156mb data)
1.000.000 lines (1.56 GB data)
3.000.000 lines (4.5 GB data)


Thanks,

Attila Szabo


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by David Robson <da...@quest.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review145463
-----------------------------------------------------------



Import the TST_PRODUCT table which is created by the automated tests:

import --direct --connect jdbc:oracle:thin:@//host:1521/service --username oraoop --password oraoop --table TST_PRODUCT --target-dir tst_product

Export it back to a new table:

export -Doraoop.template.table=tst_product --direct --connect jdbc:oracle:thin:@//host:1521/service --username oraoop --password oraoop --table TST_PRODUCT_TEST --export-dir tst_product

Then do an update export onto the newly created table:

export --direct --connect jdbc:oracle:thin:@//host:1521/service --username oraoop --password oraoop --table TST_PRODUCT_TEST --export-dir tst_product --update-key PRODUCT_ID

It doesn't actually update any data due to the following warning:

WARN oracle.OraOopOutputFormatUpdate: 346 records were skipped due to a NULL value within one of the update-key column(s).

It looks like this is something to do with the quoting of columns but it was working before this patch. I think the whole table / column quoting is actually quite broken at the moment so we probably need to do further testing here and possibly raise another issue.

The new automated tests fail for me as well - the second one looks like it is caused by this quoting issue. The first one is saying the value is too large - perhaps I am running a different character set to you or something?

Testsuite: org.apache.sqoop.manager.oracle.OraOopTypesTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.532 sec

Testcase: ensureTypesAfterExportMappedAsExpected took 2.519 sec
        Caused an ERROR
ORA-12899: value too large for column "SQOOPTEST"."ORACLE_DATATYPES_TEMPLATE"."C11_CHAR" (actual: 47, maximum: 29)

java.sql.SQLException: ORA-12899: value too large for column "SQOOPTEST"."ORACLE_DATATYPES_TEMPLATE"."C11_CHAR" (actual: 47, maximum: 29)

        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399)
        at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1059)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:522)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587)
        at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
        at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:30)
        at oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:931)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1150)
        at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1792)
        at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1745)
        at oracle.jdbc.driver.OracleStatementWrapper.execute(OracleStatementWrapper.java:334)
        at org.apache.sqoop.manager.oracle.OraOopTypesTest.ensureTypesAfterExportMappedAsExpected(OraOopTypesTest.java:39)

Testsuite: org.apache.sqoop.manager.oracle.OraOopUpdateKeyTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.195 sec

java.lang.Exception: java.io.IOException: java.sql.SQLSyntaxErrorException: ORA-00904: "B"."product_id": invalid identifier

        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:399)
Caused by: java.io.IOException: java.sql.SQLSyntaxErrorException: ORA-00904: "B"."product_id": invalid identifier

        at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:211)
        at org.apache.sqoop.manager.oracle.OraOopOutputFormatBase$OraOopDBRecordWriterBase.close(OraOopOutputFormatBase.java:555)
        at org.apache.sqoop.manager.oracle.OraOopOutputFormatUpdate$OraOopDBRecordWriterUpdate.close(OraOopOutputFormatUpdate.java:113)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:641)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:762)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "B"."product_id": invalid identifier

- David Robson


On Aug. 9, 2016, 10:50 p.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2016, 10:50 p.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopConnManager.java 216c771 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatBase.java 8f94cf8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatUpdate.java a33768f 
>   src/java/org/apache/sqoop/manager/oracle/OraOopUtilities.java e81588c 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
>   src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OraOopUpdateKeyTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by Attila Szabo <as...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/
-----------------------------------------------------------

(Updated Aug. 9, 2016, 10:50 p.m.)


Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.


Changes
-------

With the help of David Robson I was able to identify one issue around "update-key" option, and also was able to spot another issue (left behind after the changes around Oracle escaped column name support). Both of them are fixed. New test case attached as well. New diff reflects all of the changes. Please do another round of review!


Repository: sqoop-trunk


Description
-------

Proposed changes for SQOOP-2983


Diffs (updated)
-----

  src/java/org/apache/sqoop/manager/oracle/OraOopConnManager.java 216c771 
  src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatBase.java 8f94cf8 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatUpdate.java a33768f 
  src/java/org/apache/sqoop/manager/oracle/OraOopUtilities.java e81588c 
  src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
  src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
  src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
  src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OraOopUpdateKeyTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50155/diff/


Testing
-------

800 columns with table
100.000 lines (156mb data)
1.000.000 lines (1.56 GB data)
3.000.000 lines (4.5 GB data)


Thanks,

Attila Szabo


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by Attila Szabo <as...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/
-----------------------------------------------------------

(Updated July 28, 2016, 11:17 p.m.)


Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.


Changes
-------

Automated 3rd party test added to ensure the datatypes/datatype mappings are works as before (and as intended)


Repository: sqoop-trunk


Description
-------

Proposed changes for SQOOP-2983


Diffs (updated)
-----

  src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatBase.java 8f94cf8 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
  src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatUpdate.java a33768f 
  src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
  src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
  src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
  src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50155/diff/


Testing
-------

800 columns with table
100.000 lines (156mb data)
1.000.000 lines (1.56 GB data)
3.000.000 lines (4.5 GB data)


Thanks,

Attila Szabo


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by SenthilBharani Bogana Vijay <bv...@gmail.com>.
6nRrw6
On Jul 18, 2016 12:19 PM, "Attila Szabo" <as...@cloudera.com> wrote:
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
>
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> Proposed changes for SQOOP-2983
>
>
> Diffs
> -----
>
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java
82e4266
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java
d5eebf4
>
> Diff: https://reviews.apache.org/r/50155/diff/
>
>
> Testing
> -------
>
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
>
>
> Thanks,
>
> Attila Szabo
>

Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by David Robson <da...@quest.com>.

> On July 19, 2016, 5:06 a.m., David Robson wrote:
> > src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java, line 244
> > <https://reviews.apache.org/r/50155/diff/1/?file=1446151#file1446151line244>
> >
> >     Have you done extensive testing with all data types for this change? Originally Sqoop didn't work too well with Oracle data types which is why there is code here to do different things with bind variables based on the data type. Also this means there will now be a different code path for update/merge export jobs compared to insert jobs so I think it would be best to fix it in OraOopOutputFormatBase if you want to improve the performance then the new code can be used for all job types.
> 
> Attila Szabo wrote:
>     Hi Dave,
>     
>     Thanks for you invaluable feedback. I've been also considering do the fix a level above to have the same execution path for insert/update/merge, I was just not confident enough if this change should affect those parts as well. As you've advised that too, let me provide a new version of patch soon.
>     
>     On the types front:
>     Could you please give me a few concrete example which types caused problems in the past. In that case I would be able to add a more serious testing around those once

OraOopOutputFormatBase.configurePreparedStatementColumns calls setBindValueAtName which has the code related to this. This was written a long time ago now so perhaps Sqoop has been updated since then to cope better with various data types and it might not be needed anymore (or could be refactored to be much faster). The main ones are the timestamp related columns - Oracle stores dates and timezones differently to Sqoop so we mapped these as a String which overcomes most of the problems. Of course some users wanted to still map these as Timestamp so OraOop has an option for this. There is also some code in there for binary floats and binary doubles - not sure if it's still needed or not.
I guess the fact there is a fair bit of code in here that is called for every row that loops through every column is not ideal, so if the simpler way works then that would be good.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review142693
-----------------------------------------------------------


On July 18, 2016, 7:19 p.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated July 18, 2016, 7:19 p.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 82e4266 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by Attila Szabo <as...@cloudera.com>.

> On July 19, 2016, 5:06 a.m., David Robson wrote:
> > src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java, line 972
> > <https://reviews.apache.org/r/50155/diff/1/?file=1446150#file1446150line972>
> >
> >     This should be fine to disable the validation to improve performance as we should have already inserted into the correct staging tables.

I've had the same thoughts! Thank you Dave for confirming this!


> On July 19, 2016, 5:06 a.m., David Robson wrote:
> > src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java, line 244
> > <https://reviews.apache.org/r/50155/diff/1/?file=1446151#file1446151line244>
> >
> >     Have you done extensive testing with all data types for this change? Originally Sqoop didn't work too well with Oracle data types which is why there is code here to do different things with bind variables based on the data type. Also this means there will now be a different code path for update/merge export jobs compared to insert jobs so I think it would be best to fix it in OraOopOutputFormatBase if you want to improve the performance then the new code can be used for all job types.

Hi Dave,

Thanks for you invaluable feedback. I've been also considering do the fix a level above to have the same execution path for insert/update/merge, I was just not confident enough if this change should affect those parts as well. As you've advised that too, let me provide a new version of patch soon.

On the types front:
Could you please give me a few concrete example which types caused problems in the past. In that case I would be able to add a more serious testing around those once


- Attila


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review142693
-----------------------------------------------------------


On July 18, 2016, 7:19 p.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated July 18, 2016, 7:19 p.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 82e4266 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by Attila Szabo <as...@cloudera.com>.

> On July 19, 2016, 5:06 a.m., David Robson wrote:
> > src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java, line 244
> > <https://reviews.apache.org/r/50155/diff/1/?file=1446151#file1446151line244>
> >
> >     Have you done extensive testing with all data types for this change? Originally Sqoop didn't work too well with Oracle data types which is why there is code here to do different things with bind variables based on the data type. Also this means there will now be a different code path for update/merge export jobs compared to insert jobs so I think it would be best to fix it in OraOopOutputFormatBase if you want to improve the performance then the new code can be used for all job types.
> 
> Attila Szabo wrote:
>     Hi Dave,
>     
>     Thanks for you invaluable feedback. I've been also considering do the fix a level above to have the same execution path for insert/update/merge, I was just not confident enough if this change should affect those parts as well. As you've advised that too, let me provide a new version of patch soon.
>     
>     On the types front:
>     Could you please give me a few concrete example which types caused problems in the past. In that case I would be able to add a more serious testing around those once
> 
> David Robson wrote:
>     OraOopOutputFormatBase.configurePreparedStatementColumns calls setBindValueAtName which has the code related to this. This was written a long time ago now so perhaps Sqoop has been updated since then to cope better with various data types and it might not be needed anymore (or could be refactored to be much faster). The main ones are the timestamp related columns - Oracle stores dates and timezones differently to Sqoop so we mapped these as a String which overcomes most of the problems. Of course some users wanted to still map these as Timestamp so OraOop has an option for this. There is also some code in there for binary floats and binary doubles - not sure if it's still needed or not.
>     I guess the fact there is a fair bit of code in here that is called for every row that loops through every column is not ideal, so if the simpler way works then that would be good.

Hi Dave,

In the new patchfile I've made the changes you've requested and also added an automated 3rd party testcase to ensure type mappings. Could you please review that version too, and give a Ship It! if everything is fine?

Thanks,
Attila


- Attila


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review142693
-----------------------------------------------------------


On July 28, 2016, 11:17 p.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 11:17 p.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 6b27bd8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatBase.java 8f94cf8 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatUpdate.java a33768f 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 9d91887 
>   src/test/org/apache/sqoop/manager/oracle/ExportTest.java 991b221 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTestCase.java 9fe4821 
>   src/test/org/apache/sqoop/manager/oracle/OraOopTypesTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables

Posted by David Robson <da...@quest.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review142693
-----------------------------------------------------------




src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java (line 972)
<https://reviews.apache.org/r/50155/#comment208334>

    This should be fine to disable the validation to improve performance as we should have already inserted into the correct staging tables.



src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java (line 242)
<https://reviews.apache.org/r/50155/#comment208333>

    Have you done extensive testing with all data types for this change? Originally Sqoop didn't work too well with Oracle data types which is why there is code here to do different things with bind variables based on the data type. Also this means there will now be a different code path for update/merge export jobs compared to insert jobs so I think it would be best to fix it in OraOopOutputFormatBase if you want to improve the performance then the new code can be used for all job types.


- David Robson


On July 18, 2016, 7:19 p.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated July 18, 2016, 7:19 p.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 82e4266 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>