You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/25 07:36:23 UTC

[GitHub] [hudi] YuweiXiao commented on pull request #6636: [HUDI-4824]Add new index RANGE_BUCKET , when primary key is auto-increment like most mysql table

YuweiXiao commented on PR #6636:
URL: https://github.com/apache/hudi/pull/6636#issuecomment-1257139465

   > > Hey, thanks for the contribution. It is a great enhancement for bucket index.
   > > On high-level, could we use the current BucketIndex abstraction to unify the implementation of different BucketIndexEngines? Also, the dedicated Partitioner (i.e., SparkRangeBucketIndexPartitioner) may not be necessary, as long as we tag the file id during indexing (checkout consistent hashing which uses default Partitioner).
   > 
   > ```
   >  Right now, rangBucketIndex generate file like "00000009-0_2-12-29_20220924180225595.parquet", and it doesn't contain any UUID element,  I think it's ok, am I right?
   >  By this clue, if simpleBucketIndex also act like this, SparkBucketIndexPartitioner may not be necessary eigther? and if use default partitioner, it can reduce a lot of empty spark-task。
   > ```
   > 
   > @YuweiXiao
   
   Yeah, I was thinking the same thing, have id as the name rather than concatenating the uuid. But I think the benefit is saving the metadata loading overhead (i.e., listing to get the filename) rather than the one you mentioned. With the default partitioner, it should not be empty partition (`UpsertPartitioner`). Please correct me if I am wrong.
   
   Also, we better to follow the naming convention of the file group, in case of potential compatibility problems. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org