You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Fero Szabo via Review Board <no...@reviews.apache.org> on 2018/11/08 15:34:10 UTC

Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/
-----------------------------------------------------------

(Updated Nov. 8, 2018, 3:34 p.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Changes
-------

The most notable change in this newest changesetis around verification: 
- as we discussed with Szabi: since decimal conversion is put in place it's not enough to assert on the data itself (as the data appears as String in-memory, it could be String on the disk as well)
- so, I've added a new function that checks the schema for decimal types as well


Bugs: SQOOP-3382
    https://issues.apache.org/jira/browse/SQOOP-3382


Repository: sqoop-trunk


Description
-------

This patch is about adding support for fixed point decimal types in parquet import.

The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.

For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.

I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.

** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
- I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
- However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
- The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
_Please share your thoughts on this!_


Diffs (updated)
-----

  src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 80c069888 
  src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
  src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
  src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
  src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
  src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
  src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
  src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
  src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
  src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
  src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
  src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
  src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
  src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
  src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
  src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 


Diff: https://reviews.apache.org/r/69060/diff/3/

Changes: https://reviews.apache.org/r/69060/diff/2-3/


Testing
-------

3rd party tests and unit tests, both gradle and ant


Thanks,

Fero Szabo


Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

Posted by Boglarka Egyed <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/#review210533
-----------------------------------------------------------


Ship it!




Thanks for the updates Fero, let's ship this!

- Boglarka Egyed


On Nov. 12, 2018, 4:33 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69060/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2018, 4:33 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3382
>     https://issues.apache.org/jira/browse/SQOOP-3382
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This patch is about adding support for fixed point decimal types in parquet import.
> 
> The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.
> 
> For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.
> 
> I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.
> 
> ** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
> - I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
> - However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
> - The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
> _Please share your thoughts on this!_
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java e82154309 
>   src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
>   src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
>   src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
>   src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
>   src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
>   src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 
> 
> 
> Diff: https://reviews.apache.org/r/69060/diff/4/
> 
> 
> Testing
> -------
> 
> 3rd party tests and unit tests, both gradle and ant
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/#review210519
-----------------------------------------------------------


Ship it!




Hi Feró,

Thank you for fixing all the findings, I have ran the unit and third party tests with the latest patch and all of them passed.
Ship it!

- Szabolcs Vasas


On Nov. 12, 2018, 4:33 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69060/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2018, 4:33 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3382
>     https://issues.apache.org/jira/browse/SQOOP-3382
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This patch is about adding support for fixed point decimal types in parquet import.
> 
> The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.
> 
> For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.
> 
> I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.
> 
> ** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
> - I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
> - However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
> - The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
> _Please share your thoughts on this!_
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java e82154309 
>   src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
>   src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
>   src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
>   src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
>   src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
>   src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 
> 
> 
> Diff: https://reviews.apache.org/r/69060/diff/4/
> 
> 
> Testing
> -------
> 
> 3rd party tests and unit tests, both gradle and ant
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/
-----------------------------------------------------------

(Updated Nov. 12, 2018, 4:33 p.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-3382
    https://issues.apache.org/jira/browse/SQOOP-3382


Repository: sqoop-trunk


Description
-------

This patch is about adding support for fixed point decimal types in parquet import.

The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.

For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.

I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.

** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
- I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
- However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
- The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
_Please share your thoughts on this!_


Diffs (updated)
-----

  src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
  src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
  src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java e82154309 
  src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
  src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
  src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
  src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
  src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
  src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
  src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
  src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
  src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
  src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
  src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
  src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
  src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
  src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 


Diff: https://reviews.apache.org/r/69060/diff/4/

Changes: https://reviews.apache.org/r/69060/diff/3-4/


Testing
-------

3rd party tests and unit tests, both gradle and ant


Thanks,

Fero Szabo


Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/#review210437
-----------------------------------------------------------




src/java/org/apache/sqoop/config/ConfigurationConstants.java
Lines 100 (patched)
<https://reviews.apache.org/r/69060/#comment295097>

    This should be 'Enable parquet logical types...'



src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java
Line 69 (original), 79 (patched)
<https://reviews.apache.org/r/69060/#comment295103>

    I have submitted some suggestions to this test to my fork: https://github.com/szvasas/sqoop/commit/3f1fb5846446023830903770c6cc4156337de141
    please take a look



src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java
Lines 21 (patched)
<https://reviews.apache.org/r/69060/#comment295101>

    I think this interface could extend ImportJobTestConfiguration that would simplify some test cases.



src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java
Lines 21 (patched)
<https://reviews.apache.org/r/69060/#comment295102>

    I think this interface could extend ImportJobTestConfiguration that would simplify some test cases.


- Szabolcs Vasas


On Nov. 8, 2018, 3:34 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69060/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2018, 3:34 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3382
>     https://issues.apache.org/jira/browse/SQOOP-3382
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This patch is about adding support for fixed point decimal types in parquet import.
> 
> The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.
> 
> For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.
> 
> I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.
> 
> ** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
> - I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
> - However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
> - The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
> _Please share your thoughts on this!_
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 80c069888 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
>   src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
>   src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
>   src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
>   src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
>   src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
>   src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 
> 
> 
> Diff: https://reviews.apache.org/r/69060/diff/3/
> 
> 
> Testing
> -------
> 
> 3rd party tests and unit tests, both gradle and ant
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

Posted by Boglarka Egyed <bo...@apache.org>.

> On Nov. 9, 2018, 2:26 p.m., Boglarka Egyed wrote:
> > src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java
> > Lines 220 (patched)
> > <https://reviews.apache.org/r/69060/diff/3/?file=2106486#file2106486line224>
> >
> >     I think these tests could be parameterized as they are doing the same but with different file formats (Avro and Parquet).
> 
> Fero Szabo wrote:
>     Hi Bogi,
>     
>     Thanks for the review!
>     
>     There is a tiny difference: to enable logical types in parquet, there is new flag (sqoop.parquet.logical_types.decimal.enable), i.e. only used in the parquet tests. 
>     
>     I'd keep this code as is, as deduplication might lead to spaghetti code here (since these are different features after all).
>     
>     Even though this is a bit of a compromise, I'd like to drop this issue if that's OK with you (?)

Thanks for pointing this out, Fero. I think you are right about creating spaghetti code here, please feel free to drop my previous comment. Also, the tests are now easy-to-read which is good as it is.


- Boglarka


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/#review210439
-----------------------------------------------------------


On Nov. 12, 2018, 4:33 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69060/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2018, 4:33 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3382
>     https://issues.apache.org/jira/browse/SQOOP-3382
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This patch is about adding support for fixed point decimal types in parquet import.
> 
> The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.
> 
> For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.
> 
> I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.
> 
> ** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
> - I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
> - However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
> - The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
> _Please share your thoughts on this!_
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java e82154309 
>   src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
>   src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
>   src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
>   src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
>   src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
>   src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 
> 
> 
> Diff: https://reviews.apache.org/r/69060/diff/4/
> 
> 
> Testing
> -------
> 
> 3rd party tests and unit tests, both gradle and ant
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.

> On Nov. 9, 2018, 2:26 p.m., Boglarka Egyed wrote:
> > src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java
> > Lines 220 (patched)
> > <https://reviews.apache.org/r/69060/diff/3/?file=2106486#file2106486line224>
> >
> >     I think these tests could be parameterized as they are doing the same but with different file formats (Avro and Parquet).

Hi Bogi,

Thanks for the review!

There is a tiny difference: to enable logical types in parquet, there is new flag (sqoop.parquet.logical_types.decimal.enable), i.e. only used in the parquet tests. 

I'd keep this code as is, as deduplication might lead to spaghetti code here (since these are different features after all).

Even though this is a bit of a compromise, I'd like to drop this issue if that's OK with you (?)


- Fero


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/#review210439
-----------------------------------------------------------


On Nov. 8, 2018, 3:34 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69060/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2018, 3:34 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3382
>     https://issues.apache.org/jira/browse/SQOOP-3382
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This patch is about adding support for fixed point decimal types in parquet import.
> 
> The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.
> 
> For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.
> 
> I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.
> 
> ** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
> - I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
> - However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
> - The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
> _Please share your thoughts on this!_
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 80c069888 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
>   src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
>   src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
>   src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
>   src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
>   src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
>   src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 
> 
> 
> Diff: https://reviews.apache.org/r/69060/diff/3/
> 
> 
> Testing
> -------
> 
> 3rd party tests and unit tests, both gradle and ant
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Re: Review Request 69060: SQOOP-3382 Add parquet numeric support for Parquet in hdfs import

Posted by Boglarka Egyed <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69060/#review210439
-----------------------------------------------------------



Hi Fero,

Thanks for this improvement!

I have left a couple of comments related to testing, please find them below.

Thanks,
Bogi


src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java
Lines 220 (patched)
<https://reviews.apache.org/r/69060/#comment295098>

    I think these tests could be parameterized as they are doing the same but with different file formats (Avro and Parquet).



src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java
Lines 299 (patched)
<https://reviews.apache.org/r/69060/#comment295099>

    Why aren't you assert the result as a list?



src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java
Lines 301-305 (patched)
<https://reviews.apache.org/r/69060/#comment295100>

    With this logic now you don't test if the size of the expected result and the output are the same.


- Boglarka Egyed


On Nov. 8, 2018, 3:34 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69060/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2018, 3:34 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-3382
>     https://issues.apache.org/jira/browse/SQOOP-3382
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This patch is about adding support for fixed point decimal types in parquet import.
> 
> The implementation is simple after the fact that parquet was upgraded to 1.9.0 in SQOOP-3381: we just need to register the GenericDataSupplier with AvroParquetOutputFormat.
> 
> For testing, we can reuse the existing Avro tests, because Sqoop uses Avro under the hood to write parquet.
> 
> I also moved around and renamed the classes involved in this change so their name and package reflect their purpose.
> 
> ** Note: A key design decision can be seen in the ImportJobTestConfiguration interface **
> - I decided to create a new function to get the expected results for each file format, since we seldom add new fileformats. 
> - However this also enforces future configurations to always define their expected result for every file forma or throw a NotImplementedException should they lack the support for one.
> - The alternative for this is to define the fileLayout as an input parameter instead. This would allow for better extendability.
> _Please share your thoughts on this!_
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 3724f250e 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 80c069888 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java 62334f8ab 
>   src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 7a2a5f9cd 
>   src/test/org/apache/sqoop/importjob/ImportJobTestConfiguration.java 14de910b9 
>   src/test/org/apache/sqoop/importjob/SplitByImportTest.java 7977c0b0f 
>   src/test/org/apache/sqoop/importjob/avro/AvroImportForNumericTypesTest.java ff13dc3bc 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MSSQLServerImportJobTestConfiguration.java 182d2967f 
>   src/test/org/apache/sqoop/importjob/avro/configuration/MySQLImportJobTestConfiguration.java e9bf9912a 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfiguration.java b7bad08c0 
>   src/test/org/apache/sqoop/importjob/avro/configuration/OracleImportJobTestConfigurationForNumber.java 465e61f4b 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationForNumeric.java 66715c171 
>   src/test/org/apache/sqoop/importjob/avro/configuration/PostgresqlImportJobTestConfigurationPaddingShouldSucceed.java ec4db41bd 
>   src/test/org/apache/sqoop/importjob/configuration/AvroTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/importjob/configuration/GenericImportJobSplitByTestConfiguration.java f137b56b7 
>   src/test/org/apache/sqoop/importjob/configuration/ParquetTestConfiguration.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/ParquetReader.java 908ce566f 
> 
> 
> Diff: https://reviews.apache.org/r/69060/diff/3/
> 
> 
> Testing
> -------
> 
> 3rd party tests and unit tests, both gradle and ant
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>