You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by "hqx871 (via GitHub)" <gi...@apache.org> on 2023/02/04 03:18:51 UTC

[GitHub] [druid] hqx871 commented on pull request #13303: Hadoop based batch ingestion support range partition

hqx871 commented on PR #13303:
URL: https://github.com/apache/druid/pull/13303#issuecomment-1416646001

   > @hqx871 , I have left a few more comments.
   > 
   > Regarding the `sample` parameter:
   > 
   > * A more appropriate name for it would be `samplingFactor` since higher the value, fewer the samples we take. A value of 1 denoting no sampling at all.
   > * It would be better to pull the sampling logic out of `DeterminePartitionsJob` into a separate class and reuse it in the two places.
   > * It would also be nice to get some numbers to gauge the benefit that sampling provides and perhaps have a way for users to estimate the right sampling factor. Considering this, I would suggest to remove all the sampling logic from this PR and create a separate one for that, so that we can get the current one merged faster.
   > 
   > Also, could you please add a comment regarding the testing that you have done? I will try to run the hadoop ITs to ensure that things work as expected.
   
   I have removed all the sampling logic from this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org