You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/01 05:17:20 UTC

[GitHub] [hudi] yihua commented on pull request #4695: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, with deps resolution

yihua commented on pull request #4695:
URL: https://github.com/apache/hudi/pull/4695#issuecomment-1026487129


   cc @vinothchandar 
   
   My approach is pulling the HFile format relevant classes from HBase repo with rel 2.4.9, into hudi repo `hudi-io` module with renamed package of `org.apache.hudi.hbase` instead of `org.apache.hadoop.hbase`.  I trimmed some classes to limit the number of deps pulled in.  All the backward compatibility logic of KeyValue.KVComparator (hbase1) vs CellComparator (hbase2) is pulled in as well so we can control that.  In such a way, any hudi logic using HFile format is going to use internal `org.apache.hudi.hbase` classes, while SparkHoodieHBaseIndex still uses hbase lib with `org.apache.hadoop.hbase` classes (these two are independent).
   
   A few things to finalize:
   - I'm questioning whether we should flip the hbase version in hudi repo, since if we can unlock the HFile format for metadata table, Presto, Trino, with the first WIP PR, there is no real need to upgrade hbase version to 2.x, which could introduce compatibility issues for SparkHoodieHBaseIndex.  Anything I miss here?  wdyt?
   - Right now, protobuf is used to generate proto classes and I pulled in the .proto and protobuf libs (hudi-io-proto module).  Should I just put the generated java classes inside the repo and get rid of the proto related files altogether?  I can keep hudi-io-proto module though and make hudi-io include generated code, not depending on hudi-io-proto, so in the future we can still evolve the protos.
   - Regarding the new dependencies pulled in, I can further trim the list down if some can cause conflict, e.g., `commons-lang3`, `protobuf`:
   ```
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-client</artifactId>
         <scope>provided</scope>
       
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-hdfs</artifactId>
         <scope>provided</scope>
         
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-protobuf</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-miscellaneous</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-gson</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.hbase.thirdparty</groupId>
         <artifactId>hbase-shaded-netty</artifactId>
         <version>4.0.1</version>
   
         <groupId>org.apache.htrace</groupId>
         <artifactId>htrace-core4</artifactId>
         <version>4.2.0-incubating</version>
   
         <groupId>org.apache.commons</groupId>
         <artifactId>commons-lang3</artifactId>
         <version>3.12.0</version>
         <scope>compile</scope>
   
         <groupId>org.apache.yetus</groupId>
         <artifactId>audience-annotations</artifactId>
         <version>0.13.0</version>
   
         <groupId>com.esotericsoftware</groupId>
         <artifactId>kryo-shaded</artifactId>
         <version>4.0.2</version>
   ```  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org