You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Nikita Dolgov <ja...@gmail.com> on 2018/12/30 07:58:32 UTC

Writing a Druid extension

I was experimenting with a Druid extension prototype and encountered some difficulties. The experiment is to build something like https://github.com/apache/incubator-druid/issues/3891 with gRPC. 

(1) Guava version

Druid relies on 16.0.1 which is a very old version (~4 years). My only guess is another transitive dependency (Hadoop?) requires it. The earliest version used by gRPC from three years ago was 19.0. So my first question is if there are any plans for upgrading Guava any time soon.

(2) Druid thread model for query execution

I played a little with calling org.apache.druid.server.QueryLifecycleFactory::runSimple under debugger. The stack trace was rather deep to reverse engineer easily so I'd like to ask directly instead. Would it be possible to briefly explain how many threads (and from which thread pool) it takes on a broker node to process, say, a GroupBy query. 

At the very least I'd like to know if calling QueryLifecycleFactory::runSimple on a thread from some "query processing pool" is better than doing it on the IO thread that received the query.

(3) Yielder

Is it safe to assume that QueryLifecycleFactory::runSimple always returns a Yielder<org.apache.druid.data.input.Row> ? QueryLifecycle omits generic types rather liberally when dealing with Sequence instances.

(4) Calcite integration

Presumably Avatica has an option of using protobuf encoding for the returned results. Is it true that Druid cannot use it? 
On a related note, any chance there was something written down about org.apache.druid.sql.calcite ?

Thank you


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org


Re: Writing a Druid extension

Posted by Gian Merlino <gi...@gmail.com>.
Some other comments,

For 3) it is not safe to assume that QueryLifecycleFactory::runSimple
always returns org.apache.druid.data.input.Row. It does for groupBy queries
but not for other query types. The SQL layer has a bunch of code to adapt
the various query type's return types into the uniform format required by
SQL; check out QueryMaker.

For 4) consider if you only need to support Avatica, or if you want to
support the basic HTTP endpoint too (SqlResource).

On Wed, Jan 2, 2019 at 10:34 AM Charles Allen
<ch...@snap.com.invalid> wrote:

> We have a functional gRPC extension for brokers internally. Let me see if I
> can get approval for releasing it.
>
> For the explicit answers:
>
> 1) Guava 16
>
> Yep, druid is stuck on it due to hadoop.
> https://github.com/apache/incubator-druid/pull/5413 is the only
> outstanding
> issue I know of that would a very wide swath of guava implementations to be
> used. Once a solution for the same thread executor service gets into place,
> then you should be able to modify your local deployment to whatever guava
> version fits with your indexing config.
>
> 2) Group By thread processing
>
> You picked the hardest one here :) there is all kinds of multi-threaded fun
> that can show up when dealing with group by queries. If you want a good
> dive into this I suggest checking out
> https://github.com/apache/incubator-druid/pull/6629 which will put you
> straight into the weeds of it all.
>
> 3) Yielder / Sequence type safety
>
> Yeah... I don't have any good info there other than "things aren't
> currently broken". There are some really nasty and hacky type casts related
> to by segment sequences if you start digging around the code.
>
> 4) Calcite Proto
>
> This is a great question. I imagine getting a Calcite Proto SQL endpoint
> setup in an extension wouldn't be too hard, but have not tried such a
> thing. This one would probably be worth having its own discussion thread
> (maybe an issue?) on how to handle.
>
> You are on the right track!
> Charles Allen
>
> On Sat, Dec 29, 2018 at 11:59 PM Nikita Dolgov <java.saas.hadoop@gmail.com
> >
> wrote:
>
> > I was experimenting with a Druid extension prototype and encountered some
> > difficulties. The experiment is to build something like
> > https://github.com/apache/incubator-druid/issues/3891 with gRPC.
> >
> > (1) Guava version
> >
> > Druid relies on 16.0.1 which is a very old version (~4 years). My only
> > guess is another transitive dependency (Hadoop?) requires it. The
> earliest
> > version used by gRPC from three years ago was 19.0. So my first question
> is
> > if there are any plans for upgrading Guava any time soon.
> >
> > (2) Druid thread model for query execution
> >
> > I played a little with calling
> > org.apache.druid.server.QueryLifecycleFactory::runSimple under debugger.
> > The stack trace was rather deep to reverse engineer easily so I'd like to
> > ask directly instead. Would it be possible to briefly explain how many
> > threads (and from which thread pool) it takes on a broker node to
> process,
> > say, a GroupBy query.
> >
> > At the very least I'd like to know if calling
> > QueryLifecycleFactory::runSimple on a thread from some "query processing
> > pool" is better than doing it on the IO thread that received the query.
> >
> > (3) Yielder
> >
> > Is it safe to assume that QueryLifecycleFactory::runSimple always returns
> > a Yielder<org.apache.druid.data.input.Row> ? QueryLifecycle omits generic
> > types rather liberally when dealing with Sequence instances.
> >
> > (4) Calcite integration
> >
> > Presumably Avatica has an option of using protobuf encoding for the
> > returned results. Is it true that Druid cannot use it?
> > On a related note, any chance there was something written down about
> > org.apache.druid.sql.calcite ?
> >
> > Thank you
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> > For additional commands, e-mail: dev-help@druid.apache.org
> >
> >
>

Re: Writing a Druid extension

Posted by Charles Allen <ch...@snap.com.INVALID>.
We have a functional gRPC extension for brokers internally. Let me see if I
can get approval for releasing it.

For the explicit answers:

1) Guava 16

Yep, druid is stuck on it due to hadoop.
https://github.com/apache/incubator-druid/pull/5413 is the only outstanding
issue I know of that would a very wide swath of guava implementations to be
used. Once a solution for the same thread executor service gets into place,
then you should be able to modify your local deployment to whatever guava
version fits with your indexing config.

2) Group By thread processing

You picked the hardest one here :) there is all kinds of multi-threaded fun
that can show up when dealing with group by queries. If you want a good
dive into this I suggest checking out
https://github.com/apache/incubator-druid/pull/6629 which will put you
straight into the weeds of it all.

3) Yielder / Sequence type safety

Yeah... I don't have any good info there other than "things aren't
currently broken". There are some really nasty and hacky type casts related
to by segment sequences if you start digging around the code.

4) Calcite Proto

This is a great question. I imagine getting a Calcite Proto SQL endpoint
setup in an extension wouldn't be too hard, but have not tried such a
thing. This one would probably be worth having its own discussion thread
(maybe an issue?) on how to handle.

You are on the right track!
Charles Allen

On Sat, Dec 29, 2018 at 11:59 PM Nikita Dolgov <ja...@gmail.com>
wrote:

> I was experimenting with a Druid extension prototype and encountered some
> difficulties. The experiment is to build something like
> https://github.com/apache/incubator-druid/issues/3891 with gRPC.
>
> (1) Guava version
>
> Druid relies on 16.0.1 which is a very old version (~4 years). My only
> guess is another transitive dependency (Hadoop?) requires it. The earliest
> version used by gRPC from three years ago was 19.0. So my first question is
> if there are any plans for upgrading Guava any time soon.
>
> (2) Druid thread model for query execution
>
> I played a little with calling
> org.apache.druid.server.QueryLifecycleFactory::runSimple under debugger.
> The stack trace was rather deep to reverse engineer easily so I'd like to
> ask directly instead. Would it be possible to briefly explain how many
> threads (and from which thread pool) it takes on a broker node to process,
> say, a GroupBy query.
>
> At the very least I'd like to know if calling
> QueryLifecycleFactory::runSimple on a thread from some "query processing
> pool" is better than doing it on the IO thread that received the query.
>
> (3) Yielder
>
> Is it safe to assume that QueryLifecycleFactory::runSimple always returns
> a Yielder<org.apache.druid.data.input.Row> ? QueryLifecycle omits generic
> types rather liberally when dealing with Sequence instances.
>
> (4) Calcite integration
>
> Presumably Avatica has an option of using protobuf encoding for the
> returned results. Is it true that Druid cannot use it?
> On a related note, any chance there was something written down about
> org.apache.druid.sql.calcite ?
>
> Thank you
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> For additional commands, e-mail: dev-help@druid.apache.org
>
>

Re: Writing a Druid extension

Posted by Charles Allen <ch...@snap.com.INVALID>.
https://github.com/apache/incubator-druid/pull/6798 Please check it out
Nikita

On Sat, Dec 29, 2018 at 11:59 PM Nikita Dolgov <ja...@gmail.com>
wrote:

> I was experimenting with a Druid extension prototype and encountered some
> difficulties. The experiment is to build something like
> https://github.com/apache/incubator-druid/issues/3891 with gRPC.
>
> (1) Guava version
>
> Druid relies on 16.0.1 which is a very old version (~4 years). My only
> guess is another transitive dependency (Hadoop?) requires it. The earliest
> version used by gRPC from three years ago was 19.0. So my first question is
> if there are any plans for upgrading Guava any time soon.
>
> (2) Druid thread model for query execution
>
> I played a little with calling
> org.apache.druid.server.QueryLifecycleFactory::runSimple under debugger.
> The stack trace was rather deep to reverse engineer easily so I'd like to
> ask directly instead. Would it be possible to briefly explain how many
> threads (and from which thread pool) it takes on a broker node to process,
> say, a GroupBy query.
>
> At the very least I'd like to know if calling
> QueryLifecycleFactory::runSimple on a thread from some "query processing
> pool" is better than doing it on the IO thread that received the query.
>
> (3) Yielder
>
> Is it safe to assume that QueryLifecycleFactory::runSimple always returns
> a Yielder<org.apache.druid.data.input.Row> ? QueryLifecycle omits generic
> types rather liberally when dealing with Sequence instances.
>
> (4) Calcite integration
>
> Presumably Avatica has an option of using protobuf encoding for the
> returned results. Is it true that Druid cannot use it?
> On a related note, any chance there was something written down about
> org.apache.druid.sql.calcite ?
>
> Thank you
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> For additional commands, e-mail: dev-help@druid.apache.org
>
>