You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/01/10 07:02:43 UTC

[GitHub] surekhasaharan opened a new issue #6834: [Proposal] Add published segment cache in broker

surekhasaharan opened a new issue #6834: [Proposal] Add published segment cache in broker
URL: https://github.com/apache/incubator-druid/issues/6834

##Problem:

Some of the sys.segments queries are slow, they are taking as long as ~10-20 sec, which is not desirable. The cause of this slowness is call from broker to coordinator API which happens every time a query is issued to `sys.segments` table, it’s the `getMetaDataSegments` (invokes coordinator api `/druid/coordinator/v1/metadata/segments`) method which gets called from the `SegmentsTable#scan()` in `SystemSchema.java`. Coordinator can potentially returns millions of segments and most of the time is spent in parsing the json response and creating DataSegment objects.

##Motivation:

It would be useful to make these queries faster as these are used in an interactive way by the end user today. In future a unified druid console can be built on top of sys tables(#6832) and the new segment locking can also benefit from all used segments present in broker.

##Proposed Changes:

To fix this performance bottleneck, plan to add :

1. segment cache in broker (phase 1)
2. a new api in coordinator (phase 2)

####Phase 1

To speed up the sys.segments queries, in phase1 I want to add a published segments cache in broker. Broker already maintains a cache of all available segments via the ServerView, as brokers are caching segments announced by historicals, but not from metadata store (published segments are cached in coordinator only). This cache would be updated in background and therefore would allow faster query response time from broker.
Potential issue is it could lead to memory pressure on broker if the number of published segments is large. To minimize this memory pressure on Broker, the `DataSegment` instance should be shared between the “available” segments in existing broker cache and “published” segments in the new segment cache. Roughly, for about a million segments which are published and available, the heap space for reference would be ~10 MB. In addition, the complete `DataSegment` object would be stored for segments for “published but unavailable” segments, which ideally should be close to 0 segments.

####Phase 2

In phase 2 for this improvement, a more efficient coordinator API should be added. There can be several ways to add this new coordinator API, see rejected alternatives for other options considered.

This API returns a delta of added/removed segments and takes timestamp as argument. When broker comes up, it gets all the published segments from coordinator. Broker does following: orders the received segments by the timestamp (`created_date`), saves the published segment in it’s cache and keeps track of the last received segment’s timestamp. Subsequent calls to the coordinator api will only return the segments that have been added or removed since the last timestamp.The broker will poll the coordinator API at a regular interval to keep the published segment cache synced in a background thread. "added_ segments" delta can be computed based on the `created_date`, additional work would be required to compute the "deleted_segments" delta. Coordinator will need to maintain an in-memory list of deleted segments and will need to be notified when a segment gets killed external to coordinator (unless this behavior is changed as suggested in #6816). Since the deleted segments count can increase, to avoid memory pressure, coordinator can remember an hour(or some other configurable value) of deleted segments. In case, the requested timestamp is older than an hour, all the published segments can be resynced. In case of coordinator restart or leader change, again, it can send all the published segments.

##New or Changed Public Interface :
A new rest endpoint will be added to coordinator
```GET /segments/{timestamp}```

Add a timestamp field to `DataSegment` object which represents the `created_date` from `druid_segments` table in metadata store.

##Rejected Alternatives:
These options were also considered for the coordinator API
1. Coordinator sends just the `ids` of the published segments instead of complete `DataSegment` serialized objects, and then broker does a diff and finds out the segments which are not available, and then makes another call to get details for those segments. This approach was rejected because sometimes the segment_id list can be pretty large and it can cause a lot of network traffic between coordinator and broker processes and we may not achieve the performance improvement we are looking for.
2. Add a new table “druid_transactionlogs” to the metadata store, which keeps track of the segment addition and removal. The coordinator API can then query this table when it receives a GET request from broker for any timestamp, it can also query this to maintain it’s own cache. For example,

| operation | segment_id | timestamp |
| ------------- | ------------- | ------------- |
| add | s1 |ts_0 |
| disable | s2 |ts_1 |
| delete | s3 |ts_1 |

It can use write ahead logging to take care of failures/restarts in any process. While this approach is good for maintaining consistency between coordinator and broker cache as well as fault tolerance, it may not give the speed improvement if we invoke the db call on each API invocation. Another challenge would be to keep the `druid_segments` table and `druid_transactionlogs` table in sync. Unless we need this for broader use cases, it may not be worth the extra design and effort.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org