You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Szabolcs VĂ¡radi <sv...@google.com.INVALID> on 2019/07/09 14:28:07 UTC

Commons Collection v3 dependency in HadoopCodecs

Dear Developers,

Thank you for your hard work on Apache Parquet!

I am a software engineer who is working on a solution of converting large
batches of Parquet files into Capacitor files. I have a working prototype
which I am trying to finalize but I ran into some issues regarding our
dependency management which states we can only use one version of a
library. Apache Commons Collections4 is already in our dependency directory
and the current Parquet library requires the Commons Collection v3 library.
I have tried to get an exception to my case but since the v3 library is not
maintained for a long time now I was being told to try working with the
community.

I was able to trace back where the dependency is used which is the
HadoopCodecs. Hadoop Configuration uses Collections v3 in a form of an
UnmodifiableMap. ParquetReadOptions.Builder class seems to be tightly
coupled to this CodecFactory. ParquetReadOptions constructor cannot be
accessed from outside of the package which did not let me to exchange it to
my own implementation.

https://github.com/apache/parquet-mr/blob/8ff867a2e183f50b7b2f2c6e51d07c5314577ce0/parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java#L149
https://github.com/apache/parquet-mr/blob/8ff867a2e183f50b7b2f2c6e51d07c5314577ce0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopCodecs.java#L29
https://github.com/apache/hadoop/blob/8861573e8c97b0d040aeb9bdd71c6f0c62038af7/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L510

Could you point me into a direction how is it would be possible to
workaround this dependency?
What would be your suggestion?

Thanks,
Szabolcs