You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/09/23 19:59:50 UTC

[GitHub] [incubator-hudi] umehrot2 commented on issue #915: Shade and relocate Avro dependency in hadoop-mr-bundle

umehrot2 commented on issue #915: Shade and relocate Avro dependency in hadoop-mr-bundle
URL: https://github.com/apache/incubator-hudi/pull/915#issuecomment-534259843
 
 
   > My question to you is . Can Hive 2.3.5 as is support avro tables (not parquet) that have logical types? if yes, we can look into what we can do get parity.
   
   @vinothchandar I don't think we need to be concerned about Hive 2.3.5 being able to support Avro tables having Logical Types. If this were a problem it should exist even now. Like Spark 2.4.3 supports higher version of Avro, and has support for handling Logical Types by converting to fixed length byte arrays. On Hive 2.3.5 side I believe it will try to convert this fixed length byte arrays back to its own decimal type. It should not necessarily have to understand LogicalType (if I understand correctly).
   
   The problem is we are already bundling parquet-avro within the bundle jars. This is making it really difficult to upgrade parquet version. I think Hudi should strive to work with its own versions of parquet/avro irrespective of the consuming application. This particular change should make atleast the Avro version used by Hudi common with that of Spark, and we can claim to always compile Hudi with the version of Spark that is actually writing the dataset.
   
   If you are not confident about this change, I can definitely make it configurable like you said. But on EMR side we will have to maintain this to be able to support Hudi with Spark 2.4.3 and Hive 2.3.5.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services