You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/08/31 15:30:01 UTC

[GitHub] [druid] Vinothkarthick opened a new issue #11647: Is there any possibility of rowlevel update in druid in near future?

Vinothkarthick opened a new issue #11647:
URL: https://github.com/apache/druid/issues/11647


   ### Problem
   
   In druid , on 1 dataset we have 2400 segments. We have 30 datasets of such. On a daily basis, we see the records belong to these 2400 segments gets updated. The updated records are very low ( < 0.1 % ) , but it spans across each segment. Due to this we end up in doing backfill of all datasets on daily basis.
   
   - We are doing batch Ingestion using index_parallel type on daily basis. The dataset that we load into druid from SnowFlake ( where we have enterprise wide data ) gets updated to even 20 years back on daily basis. The total records that got updated would be less than 1% of the total records in the table. But this 1% of updated data span across all segments in druid. So we are doing  a daily backfill of entire dataset in druid on daily basis. 
   
   - Due to the above use case, the cost of the druid cluster is shooting up due to large number of Middle manager Nodes.
   
   ### Ask
   - If there is a way to update only the records that got change ( May be a SQL Merge kind of functionality ), this would be beneficial.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org