You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by "kfaraz (via GitHub)" <gi...@apache.org> on 2023/05/03 04:39:45 UTC

[GitHub] [druid] kfaraz commented on pull request #13982: Parallelize storage of incremental segments

kfaraz commented on PR #13982:
URL: https://github.com/apache/druid/pull/13982#issuecomment-1532446523

@PramodSSImmaneni , 3 to 10s seems like a reasonable time for a datasource with 4000 columns.
Did you see this time increase abruptly in any specific scenario? I agree that there is room for parallelization here, but we need to understand the exact benefits that we would get from it.

It would be nice if you could share some more details on the following points:
- Did real-time ingestion slow down at any point, i.e. was there a lag buildup?
- How long did it take to persist the segments
- Once the segments were persisted, did lag catch up?
- With parallelization, how did the lag behave?
- What was your cluster setup? number of task slots, worker node types

If we do realize that parallelization is in fact needed here, it would be an MM/Indexer runtime property rather than a tuning config that has to be passed through all index specs. Users doing an ingestion need not be exposed to this detail.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org