You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/01/01 10:50:32 UTC

[GitHub] [druid] erankor opened a new issue #10727: Pre-aggregation of events before Druid

erankor opened a new issue #10727:
URL: https://github.com/apache/druid/issues/10727


   Hi all,
   
   This is not really an issue... but wanted to share here in case someone will find it useful.
   
   We are using Druid to monitor the performance of our application, e.g. how much time the server spends on DB queries, how much time is spent on search engine queries etc. We have multiple dimensions, like the customer id, the API server host, the API action etc.
   The number of events is quite large - around 10B a day, and it would probably take a significant number of middle managers to ingest it into Druid.
   The solution we ended up with is to pre-aggregate the events before pushing them to Druid - we implemented an Nginx module that gets the events over UDP from our API servers, the events are aggregated based on a Druid-like schema, and then pushed to a Kafka topic. Druid then pulls the events from Kafka using Kafka ingestion service. 
   We have a single such Nginx server, it handles all the events with its CPU utilization averaging around 10%. In the current configuration we are using, the number of events drops by a factor of ~170 during this aggregation, and we can use a very small Druid cluster to handle it.
   The Nginx module is open source, and available here - https://github.com/kaltura/nginx-aggr-module/
   
   Thanks,
   
   Eran


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] Armstrongya edited a comment on issue #10727: Pre-aggregation of events before Druid

Posted by GitBox <gi...@apache.org>.
Armstrongya edited a comment on issue #10727:
URL: https://github.com/apache/druid/issues/10727#issuecomment-754526677


   @erankor Great job. We also meet the same challenge.  We use Flink realtime pipeline to pre-aggregation ETL cleaned data, dramatically reduce input data volume for Druid. We also use Apache Data-Sketch Library to support percentile aggregation in Flink job


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] Armstrongya commented on issue #10727: Pre-aggregation of events before Druid

Posted by GitBox <gi...@apache.org>.
Armstrongya commented on issue #10727:
URL: https://github.com/apache/druid/issues/10727#issuecomment-754526677


   Great job. We also meet the same challenge.  We use Flink realtime pipeline to pre-aggregation ETL cleaned data, dramatically reduce input data volume for Druid.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org