You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/10 17:33:17 UTC

[GitHub] [druid] a2l007 opened a new issue #10975: Performance degradation with indexing large number of dimensions

a2l007 opened a new issue #10975:
URL: https://github.com/apache/druid/issues/10975

### Affected Version

Druid 0.20.0

### Description

Scaling one of our druid clusters from indexing a couple of hundred dimensions to over 3k dimensions, led to an interesting performance problem. The ingestion throughput reduced drastically once the number of dimensions went up. From achieving 65MB/s for ~300 dimensions, it dropped to ~8MB/s for 3000 dimensions even with tuning the memory config proportionally. This cluster is setup as follows :

1. Uses Kafka indexing service
2. All 3K columns stored in CSV format
2. Loaded all columns as String dimensions (createBitmapIndex set to FALSE for all columns)
3. maxRowsInMemory set to 2500 (results in more frequent flushes, but higher number increased the flush time to offheap/disk proportionally)
4. Configured segmentOutWriteFactory to write to offheap memory
5. Configured Kafka with 36 partitions in order to sustain a rate of just 10 MB/s ingest, with 36 peons
6. 32g of heap and direct memory per task. This was tuned up and down but there wasn't a significant increase in throughput.

Segment persists now take ~4 seconds just for 2500 records and this seems to be contributing to the slowdown that we're seeing.
Flame graph seems to suggest that `StringDimensionIndexer.processRowValsToUnsortedEncodedKeyComponent` is taking a significant amount of time, but it doesn't explain the slow persist behavior.

I've attached the complete flame graph with this issue as well.
[druid-flame-197025.html.zip](https://github.com/apache/druid/files/6117767/druid-flame-197025.html.zip)

The end goal is to scale this much more than 3k columns, but I'm wondering if we hit a bottleneck somewhere which is preventing from getting a decent throughput.
I'm hoping to investigate this further, but wanted to see if there are any thoughts on how to improve this.
@gianm @jihoonson Any comments about this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org