You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Szabolcs Vasas <va...@gmail.com> on 2018/07/16 15:56:18 UTC
Review Request 67929: Remove Kite dependency from the Sqoop project
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/
-----------------------------------------------------------
Review request for Sqoop.
Bugs: SQOOP-3329
https://issues.apache.org/jira/browse/SQOOP-3329
Repository: sqoop-trunk
Description
-------
- Removed kitesdk dependency from ivy.xml
- Removed Kite Dataset API based Parquet import implementation
- Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
- In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
- Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
- Modified the documentation to reflect these changes.
Diffs
-----
ivy.xml 1f587f3eb
ivy/libraries.properties 565a8bf50
src/docs/user/hive-notes.txt af97d94b3
src/docs/user/import.txt a2c16d956
src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a
src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java 2180cc20e
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java 90b910a34
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java 66ebc5b80
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11
src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f
src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a
Diff: https://reviews.apache.org/r/67929/diff/1/
Testing
-------
Ran unit and third party tests.
Thanks,
Szabolcs Vasas
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by daniel voros <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206241
-----------------------------------------------------------
Ship it!
Thanks for the update! Verified on same cluster. Ship it!
- daniel voros
On July 19, 2018, 1:52 p.m., Szabolcs Vasas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
>
> (Updated July 19, 2018, 1:52 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
>
>
> Diffs
> -----
>
> ivy.xml 1f587f3eb
> ivy/libraries.properties 565a8bf50
> src/docs/user/hive-notes.txt af97d94b3
> src/docs/user/import.txt a2c16d956
> src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
> src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
> src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
> src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
> src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
> src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
>
>
> Diff: https://reviews.apache.org/r/67929/diff/2/
>
>
> Testing
> -------
>
> Ran unit and third party tests.
>
>
> File Attachments
> ----------------
>
> trunkdependencies.graphml
> https://reviews.apache.org/media/uploaded/files/2018/07/18/4df23fec-c7a7-4dc6-8ac1-0872ee6fdadf__trunkdependencies.graphml
> kiteremovaldependencies.graphml
> https://reviews.apache.org/media/uploaded/files/2018/07/18/e8cbb4d3-1da3-4b64-96ea-09f647ece126__kiteremovaldependencies.graphml
>
>
> Thanks,
>
> Szabolcs Vasas
>
>
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by Boglarka Egyed <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206265
-----------------------------------------------------------
Ship it!
Ship It!
- Boglarka Egyed
On July 19, 2018, 1:52 p.m., Szabolcs Vasas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
>
> (Updated July 19, 2018, 1:52 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
>
>
> Diffs
> -----
>
> ivy.xml 1f587f3eb
> ivy/libraries.properties 565a8bf50
> src/docs/user/hive-notes.txt af97d94b3
> src/docs/user/import.txt a2c16d956
> src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
> src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
> src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
> src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
> src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
> src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
>
>
> Diff: https://reviews.apache.org/r/67929/diff/2/
>
>
> Testing
> -------
>
> Ran unit and third party tests.
>
>
> File Attachments
> ----------------
>
> trunkdependencies.graphml
> https://reviews.apache.org/media/uploaded/files/2018/07/18/4df23fec-c7a7-4dc6-8ac1-0872ee6fdadf__trunkdependencies.graphml
> kiteremovaldependencies.graphml
> https://reviews.apache.org/media/uploaded/files/2018/07/18/e8cbb4d3-1da3-4b64-96ea-09f647ece126__kiteremovaldependencies.graphml
>
>
> Thanks,
>
> Szabolcs Vasas
>
>
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/
-----------------------------------------------------------
(Updated July 19, 2018, 1:52 p.m.)
Review request for Sqoop.
Changes
-------
Parquet version is set to 1.6.0.
Bugs: SQOOP-3329
https://issues.apache.org/jira/browse/SQOOP-3329
Repository: sqoop-trunk
Description
-------
- Removed kitesdk dependency from ivy.xml
- Removed Kite Dataset API based Parquet import implementation
- Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
- In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
- Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
- Modified the documentation to reflect these changes.
Diffs (updated)
-----
ivy.xml 1f587f3eb
ivy/libraries.properties 565a8bf50
src/docs/user/hive-notes.txt af97d94b3
src/docs/user/import.txt a2c16d956
src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
Diff: https://reviews.apache.org/r/67929/diff/2/
Changes: https://reviews.apache.org/r/67929/diff/1-2/
Testing
-------
Ran unit and third party tests.
File Attachments
----------------
trunkdependencies.graphml
https://reviews.apache.org/media/uploaded/files/2018/07/18/4df23fec-c7a7-4dc6-8ac1-0872ee6fdadf__trunkdependencies.graphml
kiteremovaldependencies.graphml
https://reviews.apache.org/media/uploaded/files/2018/07/18/e8cbb4d3-1da3-4b64-96ea-09f647ece126__kiteremovaldependencies.graphml
Thanks,
Szabolcs Vasas
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by Szabolcs Vasas <va...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/
-----------------------------------------------------------
(Updated July 18, 2018, 12:12 p.m.)
Review request for Sqoop.
Bugs: SQOOP-3329
https://issues.apache.org/jira/browse/SQOOP-3329
Repository: sqoop-trunk
Description
-------
- Removed kitesdk dependency from ivy.xml
- Removed Kite Dataset API based Parquet import implementation
- Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
- In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
- Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
- Modified the documentation to reflect these changes.
Diffs
-----
ivy.xml 1f587f3eb
ivy/libraries.properties 565a8bf50
src/docs/user/hive-notes.txt af97d94b3
src/docs/user/import.txt a2c16d956
src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a
src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java 2180cc20e
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java 90b910a34
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java 66ebc5b80
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11
src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f
src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a
Diff: https://reviews.apache.org/r/67929/diff/1/
Testing
-------
Ran unit and third party tests.
File Attachments (updated)
----------------
trunkdependencies.graphml
https://reviews.apache.org/media/uploaded/files/2018/07/18/4df23fec-c7a7-4dc6-8ac1-0872ee6fdadf__trunkdependencies.graphml
kiteremovaldependencies.graphml
https://reviews.apache.org/media/uploaded/files/2018/07/18/e8cbb4d3-1da3-4b64-96ea-09f647ece126__kiteremovaldependencies.graphml
Thanks,
Szabolcs Vasas
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by Szabolcs Vasas <va...@gmail.com>.
> On July 18, 2018, 9:52 a.m., daniel voros wrote:
> > Hi!
> >
> > I was trying to run this on a minicluster but got the following error:
> >
> > ```
> > 2018-07-18 09:20:41,799 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
> > at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
> > at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
> > at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
> > at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
> > at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
> > at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
> > at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
> > at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
> > at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:389)
> > at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350)
> > at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:653)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
> > ```
> >
> > This is happening when we have newer version of parquet (1.8.1 IIRC) with older Avro (1.7.7 in this case).
> >
> > Where is parquet coming from?
> > - 1.9 is coming from Sqoop since this new patch
> > - Hive's hive-exec jar also contains parquet classes shaded with the original packaging
> >
> > Which gets picked seems to be random to me (even changing between reexecution of mappers!). Both are in the distributed cache.
> >
> > Where is avro coming from?
> > - There can be multiple versions under Sqoop/Hive but it doesn't really matter. Hadoop is packaged with avro under `share/hadoop/*/lib`. The jars there will take precedence over user classpath. This can be changed with `mapreduce.job.user.classpath.first=true`, but then we'd have to make sure not to override anything that Hadoop relies on.
> >
> > I've come across this issue before and solved it with shading parquet classes. Note that this could be harder to do with Sqoop's ant build scripts.
> >
> > Some other minor observations:
> > - Hadoop 3.1.0 still has Avro 1.7.7
> > - Hive has been using incompatible versions of Avro and Parquet for a long time, but they're not relying on parts of Parquet that require Avro.
> >
> > Szabolcs, I've been struggling this for too long, and a fresh pair of eyes might help spot some other options! Can you please take a look and validate what I've found?
> >
> > Regards,
> > Daniel
Hi Dani,
Thanks for looking into this!
What is this minicluster environment you are referring to, how can I set it up on my side?
I have taken a quick look at the dependencies and I can see that Hive references Parquet 1.6 so that might cause an issue.
We can change this patch to keep the parquet-avro 1.6.0 dependency (which was brought in by Kite earlier) so we would be in-line with the Hive dependencies and later with the Hadoop 3/Hive 3 upgrade we could take a look how we could upgrade the Parquet dependency.
At this point we do not require Parquet 1.9, I have just added it since it a quite recent version but there is nothing in the patch which relies on it.
I will upload the graphml dependency files for reference.
- Szabolcs
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206195
-----------------------------------------------------------
On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
>
> (Updated July 16, 2018, 3:56 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
>
>
> Diffs
> -----
>
> ivy.xml 1f587f3eb
> ivy/libraries.properties 565a8bf50
> src/docs/user/hive-notes.txt af97d94b3
> src/docs/user/import.txt a2c16d956
> src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
> src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java 2180cc20e
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java 90b910a34
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java 66ebc5b80
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
> src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
> src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
> src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
> src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11
> src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
> src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f
> src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
> src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a
>
>
> Diff: https://reviews.apache.org/r/67929/diff/1/
>
>
> Testing
> -------
>
> Ran unit and third party tests.
>
>
> Thanks,
>
> Szabolcs Vasas
>
>
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by daniel voros <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206195
-----------------------------------------------------------
Hi!
I was trying to run this on a minicluster but got the following error:
```
2018-07-18 09:20:41,799 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:389)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:653)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
```
This is happening when we have newer version of parquet (1.8.1 IIRC) with older Avro (1.7.7 in this case).
Where is parquet coming from?
- 1.9 is coming from Sqoop since this new patch
- Hive's hive-exec jar also contains parquet classes shaded with the original packaging
Which gets picked seems to be random to me (even changing between reexecution of mappers!). Both are in the distributed cache.
Where is avro coming from?
- There can be multiple versions under Sqoop/Hive but it doesn't really matter. Hadoop is packaged with avro under `share/hadoop/*/lib`. The jars there will take precedence over user classpath. This can be changed with `mapreduce.job.user.classpath.first=true`, but then we'd have to make sure not to override anything that Hadoop relies on.
I've come across this issue before and solved it with shading parquet classes. Note that this could be harder to do with Sqoop's ant build scripts.
Some other minor observations:
- Hadoop 3.1.0 still has Avro 1.7.7
- Hive has been using incompatible versions of Avro and Parquet for a long time, but they're not relying on parts of Parquet that require Avro.
Szabolcs, I've been struggling this for too long, and a fresh pair of eyes might help spot some other options! Can you please take a look and validate what I've found?
Regards,
Daniel
- daniel voros
On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
>
> (Updated July 16, 2018, 3:56 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
>
>
> Diffs
> -----
>
> ivy.xml 1f587f3eb
> ivy/libraries.properties 565a8bf50
> src/docs/user/hive-notes.txt af97d94b3
> src/docs/user/import.txt a2c16d956
> src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
> src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java 2180cc20e
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java 90b910a34
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java 66ebc5b80
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
> src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
> src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
> src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
> src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11
> src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
> src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f
> src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
> src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a
>
>
> Diff: https://reviews.apache.org/r/67929/diff/1/
>
>
> Testing
> -------
>
> Ran unit and third party tests.
>
>
> Thanks,
>
> Szabolcs Vasas
>
>
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by Fero Szabo via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206150
-----------------------------------------------------------
Ship it!
A long-awaited patch! :)
(Anyway, you could link the review to the Jira as well.)
- Fero Szabo
On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
>
> (Updated July 16, 2018, 3:56 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
>
>
> Diffs
> -----
>
> ivy.xml 1f587f3eb
> ivy/libraries.properties 565a8bf50
> src/docs/user/hive-notes.txt af97d94b3
> src/docs/user/import.txt a2c16d956
> src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
> src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java 2180cc20e
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java 90b910a34
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java 66ebc5b80
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
> src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
> src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
> src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
> src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11
> src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
> src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f
> src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
> src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a
>
>
> Diff: https://reviews.apache.org/r/67929/diff/1/
>
>
> Testing
> -------
>
> Ran unit and third party tests.
>
>
> Thanks,
>
> Szabolcs Vasas
>
>
Re: Review Request 67929: Remove Kite dependency from the Sqoop
project
Posted by Boglarka Egyed <bo...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206148
-----------------------------------------------------------
Ship it!
Hi Szabolcs,
We have reached a great milestone with this patch! :)
Thanks for all your effort on removing the Kite dependency and enabling to use Parquet for reading/writing parquet!
Your patch looks good, unit and 3rd party tests passed.
Thanks,
Bogi
- Boglarka Egyed
On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
>
> (Updated July 16, 2018, 3:56 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-3329
> https://issues.apache.org/jira/browse/SQOOP-3329
>
>
> Repository: sqoop-trunk
>
>
> Description
> -------
>
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
>
>
> Diffs
> -----
>
> ivy.xml 1f587f3eb
> ivy/libraries.properties 565a8bf50
> src/docs/user/hive-notes.txt af97d94b3
> src/docs/user/import.txt a2c16d956
> src/java/org/apache/sqoop/SqoopOptions.java cc1b75281
> src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java 050c85488
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java 2180cc20e
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java 90b910a34
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java 66ebc5b80
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java 6ebc5a31b
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java 7e179a27d
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java bd07c09f4
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java ed045cd14
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932
> src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987
> src/test/org/apache/sqoop/TestMerge.java 2b3280a5a
> src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c
> src/test/org/apache/sqoop/TestParquetImport.java b1488e8af
> src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11
> src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512
> src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f
> src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4
> src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a
>
>
> Diff: https://reviews.apache.org/r/67929/diff/1/
>
>
> Testing
> -------
>
> Ran unit and third party tests.
>
>
> Thanks,
>
> Szabolcs Vasas
>
>