You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Fero Szabo via Review Board <no...@reviews.apache.org> on 2018/02/12 13:08:11 UTC

Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/
-----------------------------------------------------------

Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-2976
    https://issues.apache.org/jira/browse/SQOOP-2976


Repository: sqoop-trunk


Description
-------

**Summary:**
Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 

Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.

Take the following SQL commands for an example: 
```
create table salary (id int, amount number (10,5));
insert into salary (id, amount) values (1, 10.5);
insert into salary (id, amount) values (2, 10.50);
select * from salary;
```
Records in an Oracle database:
1	10.5
2	10.5

Records in SQL Server (using decimal instead of number in the create statement):
1	10.50000
2	10.50000

**Solition:**
The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 

**Concerns: **
- trimmings can happen silently, should we rather raise an exception? Enabling trimming adds a new feature, but it also adds the possibility silently lose scale while import. The latter could be mitigated by a thorough documentation.
- The flags current name () doesn't really match the behavior, should I change it to something else? (avro.decimal_scale_harmonization.enable)
- How / where to document this new flag?


**Other notable changes:**
- Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
- Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*


Diffs
-----

  src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
  src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
  src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
  src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 


Diff: https://reviews.apache.org/r/65607/diff/1/


Testing
-------

See the 3 new test classes (HSQLDB, Oracle, SQL Server).


Thanks,

Fero Szabo


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Boglarka Egyed <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197274
-----------------------------------------------------------



Hi Fero,

Thank you very much for this improvement, I really appreciate that you added this many new test cases!

Regarding the flag naming I would use something like avro.decimal_scale_matching.enable as harmonization is too generic for me. I would like to see other's opinion about this too, also please be aware that in case of changing from "padding" you should change all the related method names too.

Unfortunately the documentation is not that detailed regarding Avro file format but maybe you could add a subsection under "File Formats" section in src/docs/user/import.txt about this flag and it's behavior.

I ran unit and third party tests with your patch successfuly.

Although, I have couple of comments regarding your change (which is quite big), please find them below.

Thank you,
Bogi


src/java/org/apache/sqoop/config/ConfigurationConstants.java
Lines 109 (patched)
<https://reviews.apache.org/r/65607/#comment277398>

    Please update this comment too (I assume it is from copy+paste).



src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java
Lines 52-53 (patched)
<https://reviews.apache.org/r/65607/#comment277400>

    Could you please remove the whitespace between String and []?



src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java
Lines 70 (patched)
<https://reviews.apache.org/r/65607/#comment277401>

    Could we have a more descriptive error message at the negative case maybe?



src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
Lines 98-101 (patched)
<https://reviews.apache.org/r/65607/#comment277527>

    Do we really need this here?



src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
Lines 107-108 (patched)
<https://reviews.apache.org/r/65607/#comment277402>

    Could you please clean up comments like these?



src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
Lines 131-134 (patched)
<https://reviews.apache.org/r/65607/#comment277404>

    Where does this javadoc come from?



src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
Lines 144-147 (patched)
<https://reviews.apache.org/r/65607/#comment277405>

    Where does this javadoc come from?



src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java
Lines 70 (patched)
<https://reviews.apache.org/r/65607/#comment277534>

    Do we really need to reimplement this method? Couldn't you use the one in BaseSqoopTestCase instead?



src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java
Lines 71 (patched)
<https://reviews.apache.org/r/65607/#comment277406>

    Why stmt3?



src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java
Lines 109 (patched)
<https://reviews.apache.org/r/65607/#comment277535>

    Same question here.



src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java
Lines 45-60 (patched)
<https://reviews.apache.org/r/65607/#comment277536>

    This has already been documented in the COMPILING.txt so I don't think it is necessary here. Please take a look.


- Boglarka Egyed


On Feb. 12, 2018, 2:31 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2018, 2:31 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 
> 
> **Concerns: **
> - trimmings can happen silently, should we rather raise an exception? Enabling trimming adds a new feature, but it also adds the possibility silently lose scale while import. The latter could be mitigated by a thorough documentation.
> - The flags current name () doesn't really match the behavior, should I change it to something else? (avro.decimal_scale_harmonization.enable)
> - How / where to document this new flag?
> 
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/2/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.

> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/java/org/apache/sqoop/avro/AvroUtil.java
> > Lines 88 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957645#file1957645line88>
> >
> >     I think this if statement is redundant since if schemaContainingScale is null then the method will return null anyway.

Nope, this is there because we are looking for the first non-null value returned by getDecimalSchema. Here is an example schema where this comes into play:
{"type" : "union", "value" : "{\"type\" : \"null\"}, {\"type\" : \"null\", \"precision\" : 10, \"scale\" : 5}"}

If we would remove the if statement, then getDecimalSchema would return null.

Maybe, I should have added 
```
else {
  continue;
}
```
for verbosity?


> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
> > Lines 101 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957649#file1957649line101>
> >
> >     Do we really need this file? It seems we do not use dates in this test case.

Nope, removed.


> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java
> > Lines 93 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957651#file1957651line93>
> >
> >     A similar logic is already present in org.apache.sqoop.testutil.BaseSqoopTestCase can we somehow reuse that logic?
> >     For example it already has a ConnManager field which can be initialized in org.apache.sqoop.testutil.BaseSqoopTestCase#setUp

Thanks for pointing this one out. In the end I figured out how to reuse the table creation and insertion logic from BaseSqoopTestCase as well.


> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java
> > Lines 102 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957652#file1957652line102>
> >
> >     I think this logic handling the tool options could be also moved to ArgumentUtils since it already contains similar stuff.
> >     After that this method could be simplified, the notEmpty checks could also be removed.
> >     
> >     We could also think about moving all the logic from ArgumentUtils to the builder since it would be a better practice to use the builder instead of the static methods.

I think the latter makes more sense, so I've moved the functions from ArgumentUtils to the builder and also replaced every call of ArgumentUtil's functions to use the new builder.


- Fero


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197269
-----------------------------------------------------------


On Feb. 12, 2018, 2:31 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2018, 2:31 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 
> 
> **Concerns: **
> - trimmings can happen silently, should we rather raise an exception? Enabling trimming adds a new feature, but it also adds the possibility silently lose scale while import. The latter could be mitigated by a thorough documentation.
> - The flags current name () doesn't really match the behavior, should I change it to something else? (avro.decimal_scale_harmonization.enable)
> - How / where to document this new flag?
> 
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/2/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Szabolcs Vasas <va...@gmail.com>.

> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/java/org/apache/sqoop/avro/AvroUtil.java
> > Lines 88 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957645#file1957645line88>
> >
> >     I think this if statement is redundant since if schemaContainingScale is null then the method will return null anyway.
> 
> Fero Szabo wrote:
>     Nope, this is there because we are looking for the first non-null value returned by getDecimalSchema. Here is an example schema where this comes into play:
>     {"type" : "union", "value" : "{\"type\" : \"null\"}, {\"type\" : \"null\", \"precision\" : 10, \"scale\" : 5}"}
>     
>     If we would remove the if statement, then getDecimalSchema would return null.
>     
>     Maybe, I should have added 
>     ```
>     else {
>       continue;
>     }
>     ```
>     for verbosity?

No, sorry, I have missed the for loop, it looks good.


- Szabolcs


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197269
-----------------------------------------------------------


On Feb. 13, 2018, 4:13 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2018, 4:13 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 
> 
> **Notes: **
> - trimmings cannot happen, as we generate our schema based on the database columns. Therefore, the values passed to AvroUtil#toAvro will have less or equal scale as the avro schema.
> - I will open a follow-up jira to document this change.
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/TestAvroImport.java 1172fc5d 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/metastore/TestMetastoreConfigurationParameters.java 391dc336 
>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/ArgumentUtils.java 2f95e455 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/3/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197269
-----------------------------------------------------------




src/java/org/apache/sqoop/avro/AvroUtil.java
Lines 88 (patched)
<https://reviews.apache.org/r/65607/#comment277545>

    I think this if statement is redundant since if schemaContainingScale is null then the method will return null anyway.



src/java/org/apache/sqoop/avro/AvroUtil.java
Lines 93 (patched)
<https://reviews.apache.org/r/65607/#comment277544>

    Can we extract this string value to a constant?



src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
Lines 101 (patched)
<https://reviews.apache.org/r/65607/#comment277540>

    Do we really need this file? It seems we do not use dates in this test case.



src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
Lines 108 (patched)
<https://reviews.apache.org/r/65607/#comment277541>

    I think we can delete these lines.



src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java
Lines 93 (patched)
<https://reviews.apache.org/r/65607/#comment277538>

    A similar logic is already present in org.apache.sqoop.testutil.BaseSqoopTestCase can we somehow reuse that logic?
    For example it already has a ConnManager field which can be initialized in org.apache.sqoop.testutil.BaseSqoopTestCase#setUp



src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java
Lines 99 (patched)
<https://reviews.apache.org/r/65607/#comment277537>

    Do we really need this property to be set? It seems to be initialized to the default factor classes which are used by default anyway, aren't they?



src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java
Lines 145 (patched)
<https://reviews.apache.org/r/65607/#comment277539>

    Can we make these test methods names similar to the ones defined in TestHsqldbAvroPadding (testAvroImportWithoutPadding, testAvroImportWithPadding)



src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java
Lines 81 (patched)
<https://reviews.apache.org/r/65607/#comment277394>

    Could we introduce a parameter-less version of this which would set the withHadoopCommons to true? I think most of the time this would be called to set the flag to true so could simplify the code on the client side.



src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java
Lines 82 (patched)
<https://reviews.apache.org/r/65607/#comment277395>

    I think withHadoopFlags or withCommonHadoopFlags would be more descriptive name for this field.



src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java
Lines 87 (patched)
<https://reviews.apache.org/r/65607/#comment277396>

    typo: Transforms



src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java
Lines 102 (patched)
<https://reviews.apache.org/r/65607/#comment277397>

    I think this logic handling the tool options could be also moved to ArgumentUtils since it already contains similar stuff.
    After that this method could be simplified, the notEmpty checks could also be removed.
    
    We could also think about moving all the logic from ArgumentUtils to the builder since it would be a better practice to use the builder instead of the static methods.



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 60 (patched)
<https://reviews.apache.org/r/65607/#comment277529>

    This variable does not seem to be used.



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 64 (patched)
<https://reviews.apache.org/r/65607/#comment277531>

    I think instead of 'r' this reader should be closed finally. Can we use a try-with-resources statement here?



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 67 (patched)
<https://reviews.apache.org/r/65607/#comment277532>

    We could convert the comment message to a parameter of fail() so it's more obvious for the test runner why the test has failed.



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 73 (patched)
<https://reviews.apache.org/r/65607/#comment277525>

    Please remove System.out.println



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 77 (patched)
<https://reviews.apache.org/r/65607/#comment277528>

    I think we could just let this method throw an IOException  (or rethrow it in a RuntimeException) and JUnit will fail the test anyway and log the stack trace.



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 94 (patched)
<https://reviews.apache.org/r/65607/#comment277533>

    This seems to be a dupicate of org.apache.sqoop.TestAvroImport#read can we resolve it somehow?



src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java
Lines 434 (patched)
<https://reviews.apache.org/r/65607/#comment277399>

    This documentation seems to be duplicated now, add the extra parameter or just delete it :)


- Szabolcs Vasas


On Feb. 12, 2018, 2:31 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2018, 2:31 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 
> 
> **Concerns: **
> - trimmings can happen silently, should we rather raise an exception? Enabling trimming adds a new feature, but it also adds the possibility silently lose scale while import. The latter could be mitigated by a thorough documentation.
> - The flags current name () doesn't really match the behavior, should I change it to something else? (avro.decimal_scale_harmonization.enable)
> - How / where to document this new flag?
> 
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/2/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197413
-----------------------------------------------------------



Hi Feró,

Thank you for fixing the findings so quickly! I have found a few more small things, please take a look.


src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java
Lines 152 (patched)
<https://reviews.apache.org/r/65607/#comment277572>

    It seems this method is not used, do we need it here?



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 30 (patched)
<https://reviews.apache.org/r/65607/#comment277573>

    Unused import.



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 31 (patched)
<https://reviews.apache.org/r/65607/#comment277574>

    Unused import.



src/test/org/apache/sqoop/testutil/AvroTestUtils.java
Lines 33 (patched)
<https://reviews.apache.org/r/65607/#comment277575>

    Unused import.


- Szabolcs Vasas


On Feb. 13, 2018, 4:13 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2018, 4:13 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 
> 
> **Notes: **
> - trimmings cannot happen, as we generate our schema based on the database columns. Therefore, the values passed to AvroUtil#toAvro will have less or equal scale as the avro schema.
> - I will open a follow-up jira to document this change.
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/TestAvroImport.java 1172fc5d 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/metastore/TestMetastoreConfigurationParameters.java 391dc336 
>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/ArgumentUtils.java 2f95e455 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/3/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Boglarka Egyed <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197504
-----------------------------------------------------------


Ship it!




Hi Fero, 

Thank you for all the reivew finding corrections!

Tests still pass with your patch and it looks good to me know.

Please don't forget to open a separate JIRA for the documentation part.

Many thanks,
Bogi

- Boglarka Egyed


On Feb. 14, 2018, 9:45 a.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2018, 9:45 a.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 
> 
> **Notes: **
> - trimmings cannot happen, as we generate our schema based on the database columns. Therefore, the values passed to AvroUtil#toAvro will have less or equal scale as the avro schema.
> - I will open a follow-up jira to document this change.
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/TestAvroImport.java 1172fc5d 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/metastore/TestMetastoreConfigurationParameters.java 391dc336 
>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/ArgumentUtils.java 2f95e455 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/5/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197521
-----------------------------------------------------------


Ship it!




Hi Feró,

Thank you for fixing all the findings, it looks good now.

+1 for the separate doc JIRA

- Szabolcs Vasas


On Feb. 14, 2018, 9:45 a.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 14, 2018, 9:45 a.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 
> 
> **Notes: **
> - trimmings cannot happen, as we generate our schema based on the database columns. Therefore, the values passed to AvroUtil#toAvro will have less or equal scale as the avro schema.
> - I will open a follow-up jira to document this change.
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/TestAvroImport.java 1172fc5d 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/metastore/TestMetastoreConfigurationParameters.java 391dc336 
>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/ArgumentUtils.java 2f95e455 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/5/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/
-----------------------------------------------------------

(Updated Feb. 14, 2018, 9:45 a.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-2976
    https://issues.apache.org/jira/browse/SQOOP-2976


Repository: sqoop-trunk


Description
-------

**Summary:**
Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 

Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.

Take the following SQL commands for an example: 
```
create table salary (id int, amount number (10,5));
insert into salary (id, amount) values (1, 10.5);
insert into salary (id, amount) values (2, 10.50);
select * from salary;
```
Records in an Oracle database:
1	10.5
2	10.5

Records in SQL Server (using decimal instead of number in the create statement):
1	10.50000
2	10.50000

**Solition:**
The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 

**Notes: **
- trimmings cannot happen, as we generate our schema based on the database columns. Therefore, the values passed to AvroUtil#toAvro will have less or equal scale as the avro schema.
- I will open a follow-up jira to document this change.

**Other notable changes:**
- Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
- Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*


Diffs (updated)
-----

  src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
  src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
  src/test/org/apache/sqoop/TestAvroImport.java 1172fc5d 
  src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
  src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/metastore/TestMetastoreConfigurationParameters.java 391dc336 
  src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/ArgumentUtils.java 2f95e455 
  src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 


Diff: https://reviews.apache.org/r/65607/diff/4/

Changes: https://reviews.apache.org/r/65607/diff/3-4/


Testing
-------

See the 3 new test classes (HSQLDB, Oracle, SQL Server).


Thanks,

Fero Szabo


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/
-----------------------------------------------------------

(Updated Feb. 13, 2018, 4:13 p.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-2976
    https://issues.apache.org/jira/browse/SQOOP-2976


Repository: sqoop-trunk


Description (updated)
-------

**Summary:**
Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 

Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.

Take the following SQL commands for an example: 
```
create table salary (id int, amount number (10,5));
insert into salary (id, amount) values (1, 10.5);
insert into salary (id, amount) values (2, 10.50);
select * from salary;
```
Records in an Oracle database:
1	10.5
2	10.5

Records in SQL Server (using decimal instead of number in the create statement):
1	10.50000
2	10.50000

**Solition:**
The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 

**Notes: **
- trimmings cannot happen, as we generate our schema based on the database columns. Therefore, the values passed to AvroUtil#toAvro will have less or equal scale as the avro schema.
- I will open a follow-up jira to document this change.

**Other notable changes:**
- Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
- Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*


Diffs
-----

  src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
  src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
  src/test/org/apache/sqoop/TestAvroImport.java 1172fc5d 
  src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
  src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/metastore/TestMetastoreConfigurationParameters.java 391dc336 
  src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/ArgumentUtils.java 2f95e455 
  src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 


Diff: https://reviews.apache.org/r/65607/diff/3/


Testing
-------

See the 3 new test classes (HSQLDB, Oracle, SQL Server).


Thanks,

Fero Szabo


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/
-----------------------------------------------------------

(Updated Feb. 13, 2018, 4:07 p.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-2976
    https://issues.apache.org/jira/browse/SQOOP-2976


Repository: sqoop-trunk


Description
-------

**Summary:**
Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 

Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.

Take the following SQL commands for an example: 
```
create table salary (id int, amount number (10,5));
insert into salary (id, amount) values (1, 10.5);
insert into salary (id, amount) values (2, 10.50);
select * from salary;
```
Records in an Oracle database:
1	10.5
2	10.5

Records in SQL Server (using decimal instead of number in the create statement):
1	10.50000
2	10.50000

**Solition:**
The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 

**Concerns: **
- trimmings can happen silently, should we rather raise an exception? Enabling trimming adds a new feature, but it also adds the possibility silently lose scale while import. The latter could be mitigated by a thorough documentation.
- The flags current name () doesn't really match the behavior, should I change it to something else? (avro.decimal_scale_harmonization.enable)
- How / where to document this new flag?


**Other notable changes:**
- Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
- Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*


Diffs (updated)
-----

  src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
  src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
  src/test/org/apache/sqoop/TestAvroImport.java 1172fc5d 
  src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
  src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/metastore/TestMetastoreConfigurationParameters.java 391dc336 
  src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/ArgumentUtils.java 2f95e455 
  src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 


Diff: https://reviews.apache.org/r/65607/diff/3/

Changes: https://reviews.apache.org/r/65607/diff/2-3/


Testing
-------

See the 3 new test classes (HSQLDB, Oracle, SQL Server).


Thanks,

Fero Szabo


Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/
-----------------------------------------------------------

(Updated Feb. 12, 2018, 2:31 p.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-2976
    https://issues.apache.org/jira/browse/SQOOP-2976


Repository: sqoop-trunk


Description
-------

**Summary:**
Certain databases, such as SQL Server and Postgres are storing decimal values padded with 0s, should the user insert them with less digits than the given scale. 

Other databases however, such as Oracle and HSQLDB store these numbers without trailing 0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in the avro schema.

Take the following SQL commands for an example: 
```
create table salary (id int, amount number (10,5));
insert into salary (id, amount) values (1, 10.5);
insert into salary (id, amount) values (2, 10.50);
select * from salary;
```
Records in an Oracle database:
1	10.5
2	10.5

Records in SQL Server (using decimal instead of number in the create statement):
1	10.50000
2	10.50000

**Solition:**
The fix is simply checking the scale of the returned BigDecimals against what's in the avro schema and recreates the objects in case of a mismatch. I've introduced a new property to enable this new feature, so existing behavior is not affected. 

**Concerns: **
- trimmings can happen silently, should we rather raise an exception? Enabling trimming adds a new feature, but it also adds the possibility silently lose scale while import. The latter could be mitigated by a thorough documentation.
- The flags current name () doesn't really match the behavior, should I change it to something else? (avro.decimal_scale_harmonization.enable)
- How / where to document this new flag?


**Other notable changes:**
- Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces a useful builder pattern for creating commandline arguments for tests.
- Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would be required in this class to enable better reuse. For example: the current implementation can't be used with SQL Server, because one also needs to specify the schema besides the tablename in the create and insert statements. There are also code duplications.)*


Diffs (updated)
-----

  src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
  src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
  src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
  src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
  src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 


Diff: https://reviews.apache.org/r/65607/diff/2/

Changes: https://reviews.apache.org/r/65607/diff/1-2/


Testing
-------

See the 3 new test classes (HSQLDB, Oracle, SQL Server).


Thanks,

Fero Szabo