You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/30 20:30:03 UTC

[GitHub] [incubator-druid] pdeva opened a new issue #8611: allow sharding of timeseries data

pdeva opened a new issue #8611: allow sharding of timeseries data
URL: https://github.com/apache/incubator-druid/issues/8611

### Description

Allow sharding of a data by a dimension, eg `accountId`.

This way segments are stored
1. per accounIid
2. and then for time range for that accountId

### Motivation

We use the terminology of 'small' and 'large' accounts below. Small and large here means small and large in terms of rows of data per account. So a small account may generate 500 rows per hour while a large account may generate 5M rows per hour.

Currently any query requires loading of all the segments for the time range of the query.

If you are storing data for multiple 'accounts' in a single druid cluster, then a query by a small account requires loading as many segments as a large account.

Ideally if a small account is making a query, the cluster should require minimal IO, and only require loading the segments for that particular account.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org