You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/24 15:18:17 UTC

[GitHub] [hudi] vinothchandar commented on pull request #1721: [WIP] [HUDI-1041] Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

vinothchandar commented on pull request #1721:
URL: https://github.com/apache/hudi/pull/1721#issuecomment-648885633


   > Regarding sampling, what if some of the partitions are skewed? Will that cause more overhead than flush the file out?
   
   IIRC the partitionRecordKeyPairRDD would have even distribution of keys from the precombine step which just does a `reduceByKey`. We can always support a config to increase the sampling rate, right? All depends on how much difference there is in the computed parallelism with samplingRate=0.1 and 1.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org