You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Prabhat Gupta <pr...@media.net> on 2019/02/19 10:18:20 UTC

Topic regex for kafka-indexing-service

Hey all,
Just a quick question, can someone point me to the mail thread and design
doc for kafka-indexing-service? I wanted to understand what complexities it
presents, while adding support for reading from multiple topics in a single
datasource/supervisor and what goals were in mind while choosing this
design. Since our use case absolutely requires this, I was wondering if we
could change the code to achieve this very thing by may be giving up some
features/guarantees. I can't find any alive discussion on this very topic,
so i am not sure if this is something being considered in future releases.

Thank you very much

-- 
Prabhat Kumar Gupta
Sr. Tech Lead, Data Eng.
Media.net
Ph.-9987776847

Re: Topic regex for kafka-indexing-service

Posted by Gian Merlino <gi...@apache.org>.
Hey Prabhat,

We wrote up a blog post a couple years back discussing the design:
https://imply.io/post/exactly-once-streaming-ingestion. A few of the key
PRs are:

- https://github.com/apache/incubator-druid/pull/2220 (original PR adding
the KafkaIndexTask)
- https://github.com/apache/incubator-druid/pull/2656 (original PR adding
the KafkaSupervisor, completing the feature)
- https://github.com/apache/incubator-druid/pull/4815 (PR updating both to
support incremental handoffs, a major design change)

As to complexities involved in reading from multiple topics into a single
datasource, the main area to look at would be KafkaDataSourceMetadata /
SeekableStreamDataSourceMetadata and all the things that track metadata
(look for usages of those classes). Most of them assume that each
datasource is reading from only a single topic. We wouldn't need to give up
any features or guarantees -- we'd just need to modify things from 1-1 to
1-many.

Gian

On Tue, Feb 19, 2019 at 6:54 AM Prabhat Gupta <pr...@media.net> wrote:

> Hey all,
> Just a quick question, can someone point me to the mail thread and design
> doc for kafka-indexing-service? I wanted to understand what complexities it
> presents, while adding support for reading from multiple topics in a single
> datasource/supervisor and what goals were in mind while choosing this
> design. Since our use case absolutely requires this, I was wondering if we
> could change the code to achieve this very thing by may be giving up some
> features/guarantees. I can't find any alive discussion on this very topic,
> so i am not sure if this is something being considered in future releases.
>
> Thank you very much
>
> --
> Prabhat Kumar Gupta
> Sr. Tech Lead, Data Eng.
> Media.net
> Ph.-9987776847
>