You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by "PramodSSImmaneni (via GitHub)" <gi...@apache.org> on 2023/05/03 22:49:00 UTC

[GitHub] [druid] PramodSSImmaneni commented on pull request #13982: Parallelize storage of incremental segments

PramodSSImmaneni commented on PR #13982:
URL: https://github.com/apache/druid/pull/13982#issuecomment-1533849615

@kfaraz The slowness in creation and saving of the incremental index files (because of the large number of columns) was causing ingestion lag to increase continuously and it was falling behind more and more. It would take about 3 to 10 seconds to create the incremental index files after data had been ingested and incremental segment was ready to be persisted (because configured thresholds were reached). This would case the ingestion to fall behind even though there were available cpus on the node. There are multiple segment intervals being ingested at same time so there are multiple index files and these were being saved in a serial fashion. From what I could see there were no dependencies between them and they could be persisted parallelly.

We have a mixture of datasources and some are small that don't need this higher degree of parallelism, initially I considered a MM property but then it would apply to all datasources and a larger datasource may be starved while a smaller one is using the extra threads. Having it on a per datasource basis allows it to be configurable.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org