You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@druid.apache.org by Gian Merlino <gi...@apache.org> on 2020/07/09 17:06:55 UTC

Druid + Presto?

Hey Druids,

I was wondering, is anyone on this list using Druid + Presto together? If
so, what does your architecture look like and which edition / flavor of
Presto and Druid connector are you using? What's your experience been like?
I'm asking since I'm starting to think about whether it makes sense to look
at ways to improve the integration between the two projects.

Gian

Re: Druid + Presto?

Posted by Parth Brahmbhatt <pb...@netflix.com.INVALID>.

The two forks are very different and the patches are not really sharable.
Features /improvements may get re-implemented but code has diverged
significantly enough that it is pretty much always a re-implementation. The
overall pushdown approach is also very different between the 2 forks so it
is unlikely that the implementation can be shared.

> In the overall warehouse + Druid setup you're envisioning, would Druid be
the main way of querying the tables that it stores? Or would they all be
synced periodically from the warehouse into Druid, using the warehouse as a
source of truth? I'm asking since I'm wondering how important it is to
think about functionality that might help load datasources based on tables
that are in the Presto metastore.

In most cases when our users are building a custom viz they query druid
directly for tables that it stores with batch jobs that sync data from
warehouse to druid. Druid is never the source of truth as it is always
derived from a warehouse table. However the cost of building the custom viz
is generally higher and currently there is no good/reliable way to build a
Tableau dashboard that queries druid directly. It is hard to say what the
users might do in future but I would say it largely depends on how
performant and comprehensive this new route of druid connectivity through
Presto turns out. Personally I would expect a lot more Tableau dashboards
and thus more druid tables being queried from presto, if we cover most
query patterns in a performant way.

For loading datasources based on tables that are in presto, at some point
we may develop the insert into support for druid connector via either
presto or spark. Right now we just run the hadoop batch indexer.

> Druid SQL is ANSI SQL for the most part but there are two big
differences. First, it doesn't support everything in ANSI SQL (two
examples: it currently doesn't support shuffle joins and windowed
aggregations). Second, it supports some functionality that is not in ANSI
SQL (like the TIME_ and DS_ operators). So it is smaller in some ways and
bigger in other ways. I was thinking a reverse translator could let you
write a Druid SQL query that uses our special operators, but also requires
a shuffle join, and then translate and execute it as an equivalent Presto
SQL query. The idea being you can express your query in either dialect and
get routed to the right place in the end.

I don't see a use case on our platform for Druid to Presto connectivity.
Most of druid's special operators have equivalent presto functions and for
missing operators we could just add connector level procedures in presto.
From the user's standpoint as long as we offer acceptable performance and a
full set of druid features (including scans down the line when we can
support partial agg pushdown in prestosql) we don't see a reason to support
the druid to presto route assuming presto should be a super set for both
syntax and feature set.

Thanks
Parth



On Fri, Jul 10, 2020 at 9:36 AM Mainak Ghosh <mg...@twitter.com> wrote:

> + Zhenxiao
>
> On Jul 9, 2020, at 11:48 PM, Gian Merlino <gi...@apache.org> wrote:
>
> One other thing I'm wondering is how similar are the two forks of Presto?
> Are patches generally being shared between them or are they going off in
> different directions? One example: as I understand it, aggregate pushdown
> support was added to the core of both forks relatively recently — within
> the last year or so — does it work the same way in each one? I'm wondering
> how much work can be shared between these different efforts and perhaps
> between these efforts and the Druid project itself.
>
> On Thu, Jul 9, 2020 at 11:24 PM Gian Merlino <gi...@apache.org> wrote:
>
>> Hey Samarth,
>>
>> Thanks for sharing these details.
>>
>> In the overall warehouse + Druid setup you're envisioning, would Druid be
>> the main way of querying the tables that it stores? Or would they all be
>> synced periodically from the warehouse into Druid, using the warehouse as a
>> source of truth? I'm asking since I'm wondering how important it is to
>> think about functionality that might help load datasources based on tables
>> that are in the Presto metastore.
>>
>> >  You bring up an interesting idea on the reverse connector. What do you
>> think the value of such a connector will be? I am assuming Druid SQL for
>> the most part is ANSI SQL.
>>
>> Druid SQL is ANSI SQL for the most part but there are two big
>> differences. First, it doesn't support everything in ANSI SQL (two
>> examples: it currently doesn't support shuffle joins and windowed
>> aggregations). Second, it supports some functionality that is not in ANSI
>> SQL (like the TIME_ and DS_ operators). So it is smaller in some ways and
>> bigger in other ways. I was thinking a reverse translator could let you
>> write a Druid SQL query that uses our special operators, but also requires
>> a shuffle join, and then translate and execute it as an equivalent Presto
>> SQL query. The idea being you can express your query in either dialect and
>> get routed to the right place in the end.
>>
>> On Thu, Jul 9, 2020 at 4:36 PM Samarth Jain <sa...@apache.org> wrote:
>>
>>> Gian,
>>>
>>> For the presto-sql version of Druid connector, for V1, we decided to
>>> pursue
>>> the JDBC route. You can follow along on the progress here -
>>> https://github.com/prestosql/presto/issues/1855
>>> My colleague, Parth (cc'ed as well) is working on implementing Druid
>>> aggregation push down including support for top-n style queries. Our
>>> immediate use cases, and what we think Druid
>>> generally is more suitable for, is for solving for aggregate group by
>>> style
>>> queries. Having a presto-druid connector also enables us to join data in
>>> Druid with the rest of our warehouse.
>>> In general though, for queries that don't do any aggregations i.e. which
>>> get translated to Druid SCAN queries, it makes sense to by-pass the Druid
>>> datanodes altogether and directly go
>>> to the deep storage. I think Druid provides enough metadata about the
>>> active segment files to be able to do that relatively easily.
>>>
>>> You bring up an interesting idea on the reverse connector. What do you
>>> think the value of such a connector will be? I am assuming Druid SQL for
>>> the most part is ANSI SQL.
>>>
>>> On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <zl...@twitter.com.invalid>
>>> wrote:
>>>
>>> > Thank you, Mainak.
>>> >
>>> > Hi Gian,
>>> >
>>> > Glad to see you are interested in Presto Druid connector.
>>> >
>>> > My colleague, @Hao Luo <hl...@twitter.com> @Beinan Wang
>>> > <be...@twitter.com> and
>>> > me, together, implemented the Presto Druid connector in PrestoDB:
>>> > https://prestodb.io/docs/current/connector/druid.html
>>> >
>>> > Our implementation includes:
>>> > 1. Presto could scan Druid segments to compute SQL results
>>> > 2. aggregation pushdown, where Presto leverages Druid fast aggregation
>>> > capabilities, and stream aggregated result from Druid
>>> > actually, we implemented 2 execution paths, users could use
>>> configurations
>>> > to control whether they'd like to scan segments or pushdown all
>>> sub-queries
>>> > to Druid
>>> >
>>> > We had run benchmarkings comparing Presto Druid connector with other
>>> SQL
>>> > engines. And are ready to run production workloads.
>>> >
>>> > Thanks,
>>> > Zhenxiao
>>> >
>>> > On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mg...@twitter.com>
>>> wrote:
>>> >
>>> > > Hello Gian,
>>> > >
>>> > > We are currently testing the (other) Presto Druid connector at our
>>> end.
>>> > It
>>> > > has aggregation push down support. Adding Zhenxiao to this thread
>>> since
>>> > he
>>> > > is the primary developer of the connector. He can provide the kind of
>>> > > details you are looking for.
>>> > >
>>> > > Thanks,
>>> > > Mainak
>>> > >
>>> > > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gi...@apache.org> wrote:
>>> > > >
>>> > > > By the way, I see that the other Presto has a Druid connector too:
>>> > > > https://prestodb.io/docs/current/connector/druid.html. From the
>>> docs
>>> > it
>>> > > > looks like it has different lineage and might even work
>>> differently.
>>> > > >
>>> > > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org>
>>> wrote:
>>> > > >
>>> > > >> I was thinking of exploring ideas like pushing down aggregations,
>>> > > enabling
>>> > > >> Presto to query directly from deep storage (in cases where there
>>> > aren't
>>> > > any
>>> > > >> interesting things to push down, this may be more efficient than
>>> > > querying
>>> > > >> Druid servers), enabling translation from Druid's SQL dialect to
>>> > > Presto's
>>> > > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else
>>> on
>>> > this
>>> > > >> list) have any thoughts on any of those?
>>> > > >>
>>> > > >> I'm also curious what kinds of improvements you're planning to the
>>> > > >> connector you built.
>>> > > >>
>>> > > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <
>>> samarth.jain@gmail.com>
>>> > > >> wrote:
>>> > > >>
>>> > > >>> Hi Gian,
>>> > > >>>
>>> > > >>> I contributed the jdbc based presto-druid connector in prestosql
>>> > which
>>> > > >>> went
>>> > > >>> out in release 337
>>> > > >>> https://prestosql.io/docs/current/release/release-337.html. The
>>> v1
>>> > > >>> version
>>> > > >>> of the connector doesn’t support aggregate push down yet. It is
>>> being
>>> > > >>> actively worked on and we expect it to be improved over the next
>>> few
>>> > > >>> releases. We are currently evaluating using the presto-druid
>>> > connector
>>> > > in
>>> > > >>> our Tableau setup. It would be interesting to see what changes in
>>> > Druid
>>> > > >>> would be needed to support that integration.
>>> > > >>>
>>> > > >>> Thanks,
>>> > > >>> Samarth
>>> > > >>>
>>> > > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org>
>>> > wrote:
>>> > > >>>
>>> > > >>>> Hey Druids,
>>> > > >>>>
>>> > > >>>> I was wondering, is anyone on this list using Druid + Presto
>>> > together?
>>> > > >>> If
>>> > > >>>> so, what does your architecture look like and which edition /
>>> flavor
>>> > > of
>>> > > >>>> Presto and Druid connector are you using? What's your experience
>>> > been
>>> > > >>> like?
>>> > > >>>> I'm asking since I'm starting to think about whether it makes
>>> sense
>>> > to
>>> > > >>> look
>>> > > >>>> at ways to improve the integration between the two projects.
>>> > > >>>>
>>> > > >>>> Gian
>>> > > >>>>
>>> > > >>>
>>> > > >>
>>> > >
>>> > >
>>> >
>>>
>>
>

Re: Druid + Presto?

Posted by Mainak Ghosh <mg...@twitter.com.INVALID>.

+ Zhenxiao

> On Jul 9, 2020, at 11:48 PM, Gian Merlino <gi...@apache.org> wrote:
> 
> One other thing I'm wondering is how similar are the two forks of Presto? Are patches generally being shared between them or are they going off in different directions? One example: as I understand it, aggregate pushdown support was added to the core of both forks relatively recently — within the last year or so — does it work the same way in each one? I'm wondering how much work can be shared between these different efforts and perhaps between these efforts and the Druid project itself.
> 
> On Thu, Jul 9, 2020 at 11:24 PM Gian Merlino <gian@apache.org <ma...@apache.org>> wrote:
> Hey Samarth,
> 
> Thanks for sharing these details.
> 
> In the overall warehouse + Druid setup you're envisioning, would Druid be the main way of querying the tables that it stores? Or would they all be synced periodically from the warehouse into Druid, using the warehouse as a source of truth? I'm asking since I'm wondering how important it is to think about functionality that might help load datasources based on tables that are in the Presto metastore.
> 
> >  You bring up an interesting idea on the reverse connector. What do you think the value of such a connector will be? I am assuming Druid SQL for the most part is ANSI SQL.
> 
> Druid SQL is ANSI SQL for the most part but there are two big differences. First, it doesn't support everything in ANSI SQL (two examples: it currently doesn't support shuffle joins and windowed aggregations). Second, it supports some functionality that is not in ANSI SQL (like the TIME_ and DS_ operators). So it is smaller in some ways and bigger in other ways. I was thinking a reverse translator could let you write a Druid SQL query that uses our special operators, but also requires a shuffle join, and then translate and execute it as an equivalent Presto SQL query. The idea being you can express your query in either dialect and get routed to the right place in the end.
> 
> On Thu, Jul 9, 2020 at 4:36 PM Samarth Jain <samarth@apache.org <ma...@apache.org>> wrote:
> Gian,
> 
> For the presto-sql version of Druid connector, for V1, we decided to pursue
> the JDBC route. You can follow along on the progress here -
> https://github.com/prestosql/presto/issues/1855 <https://github.com/prestosql/presto/issues/1855>
> My colleague, Parth (cc'ed as well) is working on implementing Druid
> aggregation push down including support for top-n style queries. Our
> immediate use cases, and what we think Druid
> generally is more suitable for, is for solving for aggregate group by style
> queries. Having a presto-druid connector also enables us to join data in
> Druid with the rest of our warehouse.
> In general though, for queries that don't do any aggregations i.e. which
> get translated to Druid SCAN queries, it makes sense to by-pass the Druid
> datanodes altogether and directly go
> to the deep storage. I think Druid provides enough metadata about the
> active segment files to be able to do that relatively easily.
> 
> You bring up an interesting idea on the reverse connector. What do you
> think the value of such a connector will be? I am assuming Druid SQL for
> the most part is ANSI SQL.
> 
> On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <zl...@twitter.com.invalid>
> wrote:
> 
> > Thank you, Mainak.
> >
> > Hi Gian,
> >
> > Glad to see you are interested in Presto Druid connector.
> >
> > My colleague, @Hao Luo <hluo@twitter.com <ma...@twitter.com>> @Beinan Wang
> > <beinanw@twitter.com <ma...@twitter.com>> and
> > me, together, implemented the Presto Druid connector in PrestoDB:
> > https://prestodb.io/docs/current/connector/druid.html <https://prestodb.io/docs/current/connector/druid.html>
> >
> > Our implementation includes:
> > 1. Presto could scan Druid segments to compute SQL results
> > 2. aggregation pushdown, where Presto leverages Druid fast aggregation
> > capabilities, and stream aggregated result from Druid
> > actually, we implemented 2 execution paths, users could use configurations
> > to control whether they'd like to scan segments or pushdown all sub-queries
> > to Druid
> >
> > We had run benchmarkings comparing Presto Druid connector with other SQL
> > engines. And are ready to run production workloads.
> >
> > Thanks,
> > Zhenxiao
> >
> > On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mghosh@twitter.com <ma...@twitter.com>> wrote:
> >
> > > Hello Gian,
> > >
> > > We are currently testing the (other) Presto Druid connector at our end.
> > It
> > > has aggregation push down support. Adding Zhenxiao to this thread since
> > he
> > > is the primary developer of the connector. He can provide the kind of
> > > details you are looking for.
> > >
> > > Thanks,
> > > Mainak
> > >
> > > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gian@apache.org <ma...@apache.org>> wrote:
> > > >
> > > > By the way, I see that the other Presto has a Druid connector too:
> > > > https://prestodb.io/docs/current/connector/druid.html <https://prestodb.io/docs/current/connector/druid.html>. From the docs
> > it
> > > > looks like it has different lineage and might even work differently.
> > > >
> > > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gian@apache.org <ma...@apache.org>> wrote:
> > > >
> > > >> I was thinking of exploring ideas like pushing down aggregations,
> > > enabling
> > > >> Presto to query directly from deep storage (in cases where there
> > aren't
> > > any
> > > >> interesting things to push down, this may be more efficient than
> > > querying
> > > >> Druid servers), enabling translation from Druid's SQL dialect to
> > > Presto's
> > > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on
> > this
> > > >> list) have any thoughts on any of those?
> > > >>
> > > >> I'm also curious what kinds of improvements you're planning to the
> > > >> connector you built.
> > > >>
> > > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <samarth.jain@gmail.com <ma...@gmail.com>>
> > > >> wrote:
> > > >>
> > > >>> Hi Gian,
> > > >>>
> > > >>> I contributed the jdbc based presto-druid connector in prestosql
> > which
> > > >>> went
> > > >>> out in release 337
> > > >>> https://prestosql.io/docs/current/release/release-337.html <https://prestosql.io/docs/current/release/release-337.html>. The v1
> > > >>> version
> > > >>> of the connector doesn’t support aggregate push down yet. It is being
> > > >>> actively worked on and we expect it to be improved over the next few
> > > >>> releases. We are currently evaluating using the presto-druid
> > connector
> > > in
> > > >>> our Tableau setup. It would be interesting to see what changes in
> > Druid
> > > >>> would be needed to support that integration.
> > > >>>
> > > >>> Thanks,
> > > >>> Samarth
> > > >>>
> > > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gian@apache.org <ma...@apache.org>>
> > wrote:
> > > >>>
> > > >>>> Hey Druids,
> > > >>>>
> > > >>>> I was wondering, is anyone on this list using Druid + Presto
> > together?
> > > >>> If
> > > >>>> so, what does your architecture look like and which edition / flavor
> > > of
> > > >>>> Presto and Druid connector are you using? What's your experience
> > been
> > > >>> like?
> > > >>>> I'm asking since I'm starting to think about whether it makes sense
> > to
> > > >>> look
> > > >>>> at ways to improve the integration between the two projects.
> > > >>>>
> > > >>>> Gian
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >

Re: Druid + Presto?

Posted by Gian Merlino <gi...@apache.org>.

One other thing I'm wondering is how similar are the two forks of Presto?
Are patches generally being shared between them or are they going off in
different directions? One example: as I understand it, aggregate pushdown
support was added to the core of both forks relatively recently — within
the last year or so — does it work the same way in each one? I'm wondering
how much work can be shared between these different efforts and perhaps
between these efforts and the Druid project itself.

On Thu, Jul 9, 2020 at 11:24 PM Gian Merlino <gi...@apache.org> wrote:

> Hey Samarth,
>
> Thanks for sharing these details.
>
> In the overall warehouse + Druid setup you're envisioning, would Druid be
> the main way of querying the tables that it stores? Or would they all be
> synced periodically from the warehouse into Druid, using the warehouse as a
> source of truth? I'm asking since I'm wondering how important it is to
> think about functionality that might help load datasources based on tables
> that are in the Presto metastore.
>
> >  You bring up an interesting idea on the reverse connector. What do you
> think the value of such a connector will be? I am assuming Druid SQL for
> the most part is ANSI SQL.
>
> Druid SQL is ANSI SQL for the most part but there are two big differences.
> First, it doesn't support everything in ANSI SQL (two examples: it
> currently doesn't support shuffle joins and windowed aggregations). Second,
> it supports some functionality that is not in ANSI SQL (like the TIME_ and
> DS_ operators). So it is smaller in some ways and bigger in other ways. I
> was thinking a reverse translator could let you write a Druid SQL query
> that uses our special operators, but also requires a shuffle join, and then
> translate and execute it as an equivalent Presto SQL query. The idea being
> you can express your query in either dialect and get routed to the right
> place in the end.
>
> On Thu, Jul 9, 2020 at 4:36 PM Samarth Jain <sa...@apache.org> wrote:
>
>> Gian,
>>
>> For the presto-sql version of Druid connector, for V1, we decided to
>> pursue
>> the JDBC route. You can follow along on the progress here -
>> https://github.com/prestosql/presto/issues/1855
>> My colleague, Parth (cc'ed as well) is working on implementing Druid
>> aggregation push down including support for top-n style queries. Our
>> immediate use cases, and what we think Druid
>> generally is more suitable for, is for solving for aggregate group by
>> style
>> queries. Having a presto-druid connector also enables us to join data in
>> Druid with the rest of our warehouse.
>> In general though, for queries that don't do any aggregations i.e. which
>> get translated to Druid SCAN queries, it makes sense to by-pass the Druid
>> datanodes altogether and directly go
>> to the deep storage. I think Druid provides enough metadata about the
>> active segment files to be able to do that relatively easily.
>>
>> You bring up an interesting idea on the reverse connector. What do you
>> think the value of such a connector will be? I am assuming Druid SQL for
>> the most part is ANSI SQL.
>>
>> On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <zl...@twitter.com.invalid>
>> wrote:
>>
>> > Thank you, Mainak.
>> >
>> > Hi Gian,
>> >
>> > Glad to see you are interested in Presto Druid connector.
>> >
>> > My colleague, @Hao Luo <hl...@twitter.com> @Beinan Wang
>> > <be...@twitter.com> and
>> > me, together, implemented the Presto Druid connector in PrestoDB:
>> > https://prestodb.io/docs/current/connector/druid.html
>> >
>> > Our implementation includes:
>> > 1. Presto could scan Druid segments to compute SQL results
>> > 2. aggregation pushdown, where Presto leverages Druid fast aggregation
>> > capabilities, and stream aggregated result from Druid
>> > actually, we implemented 2 execution paths, users could use
>> configurations
>> > to control whether they'd like to scan segments or pushdown all
>> sub-queries
>> > to Druid
>> >
>> > We had run benchmarkings comparing Presto Druid connector with other SQL
>> > engines. And are ready to run production workloads.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> > On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mg...@twitter.com>
>> wrote:
>> >
>> > > Hello Gian,
>> > >
>> > > We are currently testing the (other) Presto Druid connector at our
>> end.
>> > It
>> > > has aggregation push down support. Adding Zhenxiao to this thread
>> since
>> > he
>> > > is the primary developer of the connector. He can provide the kind of
>> > > details you are looking for.
>> > >
>> > > Thanks,
>> > > Mainak
>> > >
>> > > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gi...@apache.org> wrote:
>> > > >
>> > > > By the way, I see that the other Presto has a Druid connector too:
>> > > > https://prestodb.io/docs/current/connector/druid.html. From the
>> docs
>> > it
>> > > > looks like it has different lineage and might even work differently.
>> > > >
>> > > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org>
>> wrote:
>> > > >
>> > > >> I was thinking of exploring ideas like pushing down aggregations,
>> > > enabling
>> > > >> Presto to query directly from deep storage (in cases where there
>> > aren't
>> > > any
>> > > >> interesting things to push down, this may be more efficient than
>> > > querying
>> > > >> Druid servers), enabling translation from Druid's SQL dialect to
>> > > Presto's
>> > > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on
>> > this
>> > > >> list) have any thoughts on any of those?
>> > > >>
>> > > >> I'm also curious what kinds of improvements you're planning to the
>> > > >> connector you built.
>> > > >>
>> > > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <
>> samarth.jain@gmail.com>
>> > > >> wrote:
>> > > >>
>> > > >>> Hi Gian,
>> > > >>>
>> > > >>> I contributed the jdbc based presto-druid connector in prestosql
>> > which
>> > > >>> went
>> > > >>> out in release 337
>> > > >>> https://prestosql.io/docs/current/release/release-337.html. The
>> v1
>> > > >>> version
>> > > >>> of the connector doesn’t support aggregate push down yet. It is
>> being
>> > > >>> actively worked on and we expect it to be improved over the next
>> few
>> > > >>> releases. We are currently evaluating using the presto-druid
>> > connector
>> > > in
>> > > >>> our Tableau setup. It would be interesting to see what changes in
>> > Druid
>> > > >>> would be needed to support that integration.
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Samarth
>> > > >>>
>> > > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org>
>> > wrote:
>> > > >>>
>> > > >>>> Hey Druids,
>> > > >>>>
>> > > >>>> I was wondering, is anyone on this list using Druid + Presto
>> > together?
>> > > >>> If
>> > > >>>> so, what does your architecture look like and which edition /
>> flavor
>> > > of
>> > > >>>> Presto and Druid connector are you using? What's your experience
>> > been
>> > > >>> like?
>> > > >>>> I'm asking since I'm starting to think about whether it makes
>> sense
>> > to
>> > > >>> look
>> > > >>>> at ways to improve the integration between the two projects.
>> > > >>>>
>> > > >>>> Gian
>> > > >>>>
>> > > >>>
>> > > >>
>> > >
>> > >
>> >
>>
>

Re: Druid + Presto?

Posted by Gian Merlino <gi...@apache.org>.

Hey Samarth,

Thanks for sharing these details.

In the overall warehouse + Druid setup you're envisioning, would Druid be
the main way of querying the tables that it stores? Or would they all be
synced periodically from the warehouse into Druid, using the warehouse as a
source of truth? I'm asking since I'm wondering how important it is to
think about functionality that might help load datasources based on tables
that are in the Presto metastore.

>  You bring up an interesting idea on the reverse connector. What do you
think the value of such a connector will be? I am assuming Druid SQL for
the most part is ANSI SQL.

Druid SQL is ANSI SQL for the most part but there are two big differences.
First, it doesn't support everything in ANSI SQL (two examples: it
currently doesn't support shuffle joins and windowed aggregations). Second,
it supports some functionality that is not in ANSI SQL (like the TIME_ and
DS_ operators). So it is smaller in some ways and bigger in other ways. I
was thinking a reverse translator could let you write a Druid SQL query
that uses our special operators, but also requires a shuffle join, and then
translate and execute it as an equivalent Presto SQL query. The idea being
you can express your query in either dialect and get routed to the right
place in the end.

On Thu, Jul 9, 2020 at 4:36 PM Samarth Jain <sa...@apache.org> wrote:

> Gian,
>
> For the presto-sql version of Druid connector, for V1, we decided to pursue
> the JDBC route. You can follow along on the progress here -
> https://github.com/prestosql/presto/issues/1855
> My colleague, Parth (cc'ed as well) is working on implementing Druid
> aggregation push down including support for top-n style queries. Our
> immediate use cases, and what we think Druid
> generally is more suitable for, is for solving for aggregate group by style
> queries. Having a presto-druid connector also enables us to join data in
> Druid with the rest of our warehouse.
> In general though, for queries that don't do any aggregations i.e. which
> get translated to Druid SCAN queries, it makes sense to by-pass the Druid
> datanodes altogether and directly go
> to the deep storage. I think Druid provides enough metadata about the
> active segment files to be able to do that relatively easily.
>
> You bring up an interesting idea on the reverse connector. What do you
> think the value of such a connector will be? I am assuming Druid SQL for
> the most part is ANSI SQL.
>
> On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <zl...@twitter.com.invalid>
> wrote:
>
> > Thank you, Mainak.
> >
> > Hi Gian,
> >
> > Glad to see you are interested in Presto Druid connector.
> >
> > My colleague, @Hao Luo <hl...@twitter.com> @Beinan Wang
> > <be...@twitter.com> and
> > me, together, implemented the Presto Druid connector in PrestoDB:
> > https://prestodb.io/docs/current/connector/druid.html
> >
> > Our implementation includes:
> > 1. Presto could scan Druid segments to compute SQL results
> > 2. aggregation pushdown, where Presto leverages Druid fast aggregation
> > capabilities, and stream aggregated result from Druid
> > actually, we implemented 2 execution paths, users could use
> configurations
> > to control whether they'd like to scan segments or pushdown all
> sub-queries
> > to Druid
> >
> > We had run benchmarkings comparing Presto Druid connector with other SQL
> > engines. And are ready to run production workloads.
> >
> > Thanks,
> > Zhenxiao
> >
> > On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mg...@twitter.com> wrote:
> >
> > > Hello Gian,
> > >
> > > We are currently testing the (other) Presto Druid connector at our end.
> > It
> > > has aggregation push down support. Adding Zhenxiao to this thread since
> > he
> > > is the primary developer of the connector. He can provide the kind of
> > > details you are looking for.
> > >
> > > Thanks,
> > > Mainak
> > >
> > > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gi...@apache.org> wrote:
> > > >
> > > > By the way, I see that the other Presto has a Druid connector too:
> > > > https://prestodb.io/docs/current/connector/druid.html. From the docs
> > it
> > > > looks like it has different lineage and might even work differently.
> > > >
> > > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org>
> wrote:
> > > >
> > > >> I was thinking of exploring ideas like pushing down aggregations,
> > > enabling
> > > >> Presto to query directly from deep storage (in cases where there
> > aren't
> > > any
> > > >> interesting things to push down, this may be more efficient than
> > > querying
> > > >> Druid servers), enabling translation from Druid's SQL dialect to
> > > Presto's
> > > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on
> > this
> > > >> list) have any thoughts on any of those?
> > > >>
> > > >> I'm also curious what kinds of improvements you're planning to the
> > > >> connector you built.
> > > >>
> > > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <
> samarth.jain@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Gian,
> > > >>>
> > > >>> I contributed the jdbc based presto-druid connector in prestosql
> > which
> > > >>> went
> > > >>> out in release 337
> > > >>> https://prestosql.io/docs/current/release/release-337.html. The v1
> > > >>> version
> > > >>> of the connector doesn’t support aggregate push down yet. It is
> being
> > > >>> actively worked on and we expect it to be improved over the next
> few
> > > >>> releases. We are currently evaluating using the presto-druid
> > connector
> > > in
> > > >>> our Tableau setup. It would be interesting to see what changes in
> > Druid
> > > >>> would be needed to support that integration.
> > > >>>
> > > >>> Thanks,
> > > >>> Samarth
> > > >>>
> > > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org>
> > wrote:
> > > >>>
> > > >>>> Hey Druids,
> > > >>>>
> > > >>>> I was wondering, is anyone on this list using Druid + Presto
> > together?
> > > >>> If
> > > >>>> so, what does your architecture look like and which edition /
> flavor
> > > of
> > > >>>> Presto and Druid connector are you using? What's your experience
> > been
> > > >>> like?
> > > >>>> I'm asking since I'm starting to think about whether it makes
> sense
> > to
> > > >>> look
> > > >>>> at ways to improve the integration between the two projects.
> > > >>>>
> > > >>>> Gian
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: Druid + Presto?

Posted by Samarth Jain <sa...@apache.org>.

Gian,

For the presto-sql version of Druid connector, for V1, we decided to pursue
the JDBC route. You can follow along on the progress here -
https://github.com/prestosql/presto/issues/1855
My colleague, Parth (cc'ed as well) is working on implementing Druid
aggregation push down including support for top-n style queries. Our
immediate use cases, and what we think Druid
generally is more suitable for, is for solving for aggregate group by style
queries. Having a presto-druid connector also enables us to join data in
Druid with the rest of our warehouse.
In general though, for queries that don't do any aggregations i.e. which
get translated to Druid SCAN queries, it makes sense to by-pass the Druid
datanodes altogether and directly go
to the deep storage. I think Druid provides enough metadata about the
active segment files to be able to do that relatively easily.

You bring up an interesting idea on the reverse connector. What do you
think the value of such a connector will be? I am assuming Druid SQL for
the most part is ANSI SQL.

On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <zl...@twitter.com.invalid>
wrote:

> Thank you, Mainak.
>
> Hi Gian,
>
> Glad to see you are interested in Presto Druid connector.
>
> My colleague, @Hao Luo <hl...@twitter.com> @Beinan Wang
> <be...@twitter.com> and
> me, together, implemented the Presto Druid connector in PrestoDB:
> https://prestodb.io/docs/current/connector/druid.html
>
> Our implementation includes:
> 1. Presto could scan Druid segments to compute SQL results
> 2. aggregation pushdown, where Presto leverages Druid fast aggregation
> capabilities, and stream aggregated result from Druid
> actually, we implemented 2 execution paths, users could use configurations
> to control whether they'd like to scan segments or pushdown all sub-queries
> to Druid
>
> We had run benchmarkings comparing Presto Druid connector with other SQL
> engines. And are ready to run production workloads.
>
> Thanks,
> Zhenxiao
>
> On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mg...@twitter.com> wrote:
>
> > Hello Gian,
> >
> > We are currently testing the (other) Presto Druid connector at our end.
> It
> > has aggregation push down support. Adding Zhenxiao to this thread since
> he
> > is the primary developer of the connector. He can provide the kind of
> > details you are looking for.
> >
> > Thanks,
> > Mainak
> >
> > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gi...@apache.org> wrote:
> > >
> > > By the way, I see that the other Presto has a Druid connector too:
> > > https://prestodb.io/docs/current/connector/druid.html. From the docs
> it
> > > looks like it has different lineage and might even work differently.
> > >
> > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org> wrote:
> > >
> > >> I was thinking of exploring ideas like pushing down aggregations,
> > enabling
> > >> Presto to query directly from deep storage (in cases where there
> aren't
> > any
> > >> interesting things to push down, this may be more efficient than
> > querying
> > >> Druid servers), enabling translation from Druid's SQL dialect to
> > Presto's
> > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on
> this
> > >> list) have any thoughts on any of those?
> > >>
> > >> I'm also curious what kinds of improvements you're planning to the
> > >> connector you built.
> > >>
> > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <sa...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Gian,
> > >>>
> > >>> I contributed the jdbc based presto-druid connector in prestosql
> which
> > >>> went
> > >>> out in release 337
> > >>> https://prestosql.io/docs/current/release/release-337.html. The v1
> > >>> version
> > >>> of the connector doesn’t support aggregate push down yet. It is being
> > >>> actively worked on and we expect it to be improved over the next few
> > >>> releases. We are currently evaluating using the presto-druid
> connector
> > in
> > >>> our Tableau setup. It would be interesting to see what changes in
> Druid
> > >>> would be needed to support that integration.
> > >>>
> > >>> Thanks,
> > >>> Samarth
> > >>>
> > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org>
> wrote:
> > >>>
> > >>>> Hey Druids,
> > >>>>
> > >>>> I was wondering, is anyone on this list using Druid + Presto
> together?
> > >>> If
> > >>>> so, what does your architecture look like and which edition / flavor
> > of
> > >>>> Presto and Druid connector are you using? What's your experience
> been
> > >>> like?
> > >>>> I'm asking since I'm starting to think about whether it makes sense
> to
> > >>> look
> > >>>> at ways to improve the integration between the two projects.
> > >>>>
> > >>>> Gian
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: Druid + Presto?

Posted by Mainak Ghosh <mg...@twitter.com.INVALID>.

+ Zhenxiao

> On Jul 9, 2020, at 11:03 PM, Gian Merlino <gi...@apache.org> wrote:
> 
> Hey Zhenxiao, Hao, Beinan, Mainak,
> 
> Thanks for sharing information about your work.
> 
> You mention benchmarks — I'm curious, did you have a chance to benchmark each execution path? How do they look?
> 
> When you were developing the connector, did you feel like any changes in Druid would make it easier to integrate things between the two projects?
> 
> On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <zluo@twitter.com.invalid <ma...@twitter.com.invalid>> wrote:
> Thank you, Mainak.
> 
> Hi Gian,
> 
> Glad to see you are interested in Presto Druid connector.
> 
> My colleague, @Hao Luo <hluo@twitter.com <ma...@twitter.com>> @Beinan Wang
> <beinanw@twitter.com <ma...@twitter.com>> and
> me, together, implemented the Presto Druid connector in PrestoDB:
> https://prestodb.io/docs/current/connector/druid.html <https://prestodb.io/docs/current/connector/druid.html>
> 
> Our implementation includes:
> 1. Presto could scan Druid segments to compute SQL results
> 2. aggregation pushdown, where Presto leverages Druid fast aggregation
> capabilities, and stream aggregated result from Druid
> actually, we implemented 2 execution paths, users could use configurations
> to control whether they'd like to scan segments or pushdown all sub-queries
> to Druid
> 
> We had run benchmarkings comparing Presto Druid connector with other SQL
> engines. And are ready to run production workloads.
> 
> Thanks,
> Zhenxiao
> 
> On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mghosh@twitter.com <ma...@twitter.com>> wrote:
> 
> > Hello Gian,
> >
> > We are currently testing the (other) Presto Druid connector at our end. It
> > has aggregation push down support. Adding Zhenxiao to this thread since he
> > is the primary developer of the connector. He can provide the kind of
> > details you are looking for.
> >
> > Thanks,
> > Mainak
> >
> > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gian@apache.org <ma...@apache.org>> wrote:
> > >
> > > By the way, I see that the other Presto has a Druid connector too:
> > > https://prestodb.io/docs/current/connector/druid.html <https://prestodb.io/docs/current/connector/druid.html>. From the docs it
> > > looks like it has different lineage and might even work differently.
> > >
> > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gian@apache.org <ma...@apache.org>> wrote:
> > >
> > >> I was thinking of exploring ideas like pushing down aggregations,
> > enabling
> > >> Presto to query directly from deep storage (in cases where there aren't
> > any
> > >> interesting things to push down, this may be more efficient than
> > querying
> > >> Druid servers), enabling translation from Druid's SQL dialect to
> > Presto's
> > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on this
> > >> list) have any thoughts on any of those?
> > >>
> > >> I'm also curious what kinds of improvements you're planning to the
> > >> connector you built.
> > >>
> > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <samarth.jain@gmail.com <ma...@gmail.com>>
> > >> wrote:
> > >>
> > >>> Hi Gian,
> > >>>
> > >>> I contributed the jdbc based presto-druid connector in prestosql which
> > >>> went
> > >>> out in release 337
> > >>> https://prestosql.io/docs/current/release/release-337.html <https://prestosql.io/docs/current/release/release-337.html>. The v1
> > >>> version
> > >>> of the connector doesn’t support aggregate push down yet. It is being
> > >>> actively worked on and we expect it to be improved over the next few
> > >>> releases. We are currently evaluating using the presto-druid connector
> > in
> > >>> our Tableau setup. It would be interesting to see what changes in Druid
> > >>> would be needed to support that integration.
> > >>>
> > >>> Thanks,
> > >>> Samarth
> > >>>
> > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gian@apache.org <ma...@apache.org>> wrote:
> > >>>
> > >>>> Hey Druids,
> > >>>>
> > >>>> I was wondering, is anyone on this list using Druid + Presto together?
> > >>> If
> > >>>> so, what does your architecture look like and which edition / flavor
> > of
> > >>>> Presto and Druid connector are you using? What's your experience been
> > >>> like?
> > >>>> I'm asking since I'm starting to think about whether it makes sense to
> > >>> look
> > >>>> at ways to improve the integration between the two projects.
> > >>>>
> > >>>> Gian
> > >>>>
> > >>>
> > >>
> >
> >

Re: Druid + Presto?

Posted by Gian Merlino <gi...@apache.org>.

Hey Zhenxiao, Hao, Beinan, Mainak,

Thanks for sharing information about your work.

You mention benchmarks — I'm curious, did you have a chance to benchmark
each execution path? How do they look?

When you were developing the connector, did you feel like any changes in
Druid would make it easier to integrate things between the two projects?

On Thu, Jul 9, 2020 at 12:56 PM Zhenxiao Luo <zl...@twitter.com.invalid>
wrote:

> Thank you, Mainak.
>
> Hi Gian,
>
> Glad to see you are interested in Presto Druid connector.
>
> My colleague, @Hao Luo <hl...@twitter.com> @Beinan Wang
> <be...@twitter.com> and
> me, together, implemented the Presto Druid connector in PrestoDB:
> https://prestodb.io/docs/current/connector/druid.html
>
> Our implementation includes:
> 1. Presto could scan Druid segments to compute SQL results
> 2. aggregation pushdown, where Presto leverages Druid fast aggregation
> capabilities, and stream aggregated result from Druid
> actually, we implemented 2 execution paths, users could use configurations
> to control whether they'd like to scan segments or pushdown all sub-queries
> to Druid
>
> We had run benchmarkings comparing Presto Druid connector with other SQL
> engines. And are ready to run production workloads.
>
> Thanks,
> Zhenxiao
>
> On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mg...@twitter.com> wrote:
>
> > Hello Gian,
> >
> > We are currently testing the (other) Presto Druid connector at our end.
> It
> > has aggregation push down support. Adding Zhenxiao to this thread since
> he
> > is the primary developer of the connector. He can provide the kind of
> > details you are looking for.
> >
> > Thanks,
> > Mainak
> >
> > > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gi...@apache.org> wrote:
> > >
> > > By the way, I see that the other Presto has a Druid connector too:
> > > https://prestodb.io/docs/current/connector/druid.html. From the docs
> it
> > > looks like it has different lineage and might even work differently.
> > >
> > > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org> wrote:
> > >
> > >> I was thinking of exploring ideas like pushing down aggregations,
> > enabling
> > >> Presto to query directly from deep storage (in cases where there
> aren't
> > any
> > >> interesting things to push down, this may be more efficient than
> > querying
> > >> Druid servers), enabling translation from Druid's SQL dialect to
> > Presto's
> > >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on
> this
> > >> list) have any thoughts on any of those?
> > >>
> > >> I'm also curious what kinds of improvements you're planning to the
> > >> connector you built.
> > >>
> > >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <sa...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Gian,
> > >>>
> > >>> I contributed the jdbc based presto-druid connector in prestosql
> which
> > >>> went
> > >>> out in release 337
> > >>> https://prestosql.io/docs/current/release/release-337.html. The v1
> > >>> version
> > >>> of the connector doesn’t support aggregate push down yet. It is being
> > >>> actively worked on and we expect it to be improved over the next few
> > >>> releases. We are currently evaluating using the presto-druid
> connector
> > in
> > >>> our Tableau setup. It would be interesting to see what changes in
> Druid
> > >>> would be needed to support that integration.
> > >>>
> > >>> Thanks,
> > >>> Samarth
> > >>>
> > >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org>
> wrote:
> > >>>
> > >>>> Hey Druids,
> > >>>>
> > >>>> I was wondering, is anyone on this list using Druid + Presto
> together?
> > >>> If
> > >>>> so, what does your architecture look like and which edition / flavor
> > of
> > >>>> Presto and Druid connector are you using? What's your experience
> been
> > >>> like?
> > >>>> I'm asking since I'm starting to think about whether it makes sense
> to
> > >>> look
> > >>>> at ways to improve the integration between the two projects.
> > >>>>
> > >>>> Gian
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: Druid + Presto?

Posted by Zhenxiao Luo <zl...@twitter.com.INVALID>.

Thank you, Mainak.

Hi Gian,

Glad to see you are interested in Presto Druid connector.

My colleague, @Hao Luo <hl...@twitter.com> @Beinan Wang
<be...@twitter.com> and
me, together, implemented the Presto Druid connector in PrestoDB:
https://prestodb.io/docs/current/connector/druid.html

Our implementation includes:
1. Presto could scan Druid segments to compute SQL results
2. aggregation pushdown, where Presto leverages Druid fast aggregation
capabilities, and stream aggregated result from Druid
actually, we implemented 2 execution paths, users could use configurations
to control whether they'd like to scan segments or pushdown all sub-queries
to Druid

We had run benchmarkings comparing Presto Druid connector with other SQL
engines. And are ready to run production workloads.

Thanks,
Zhenxiao

On Thu, Jul 9, 2020 at 12:40 PM Mainak Ghosh <mg...@twitter.com> wrote:

> Hello Gian,
>
> We are currently testing the (other) Presto Druid connector at our end. It
> has aggregation push down support. Adding Zhenxiao to this thread since he
> is the primary developer of the connector. He can provide the kind of
> details you are looking for.
>
> Thanks,
> Mainak
>
> > On Jul 9, 2020, at 12:25 PM, Gian Merlino <gi...@apache.org> wrote:
> >
> > By the way, I see that the other Presto has a Druid connector too:
> > https://prestodb.io/docs/current/connector/druid.html. From the docs it
> > looks like it has different lineage and might even work differently.
> >
> > On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org> wrote:
> >
> >> I was thinking of exploring ideas like pushing down aggregations,
> enabling
> >> Presto to query directly from deep storage (in cases where there aren't
> any
> >> interesting things to push down, this may be more efficient than
> querying
> >> Druid servers), enabling translation from Druid's SQL dialect to
> Presto's
> >> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on this
> >> list) have any thoughts on any of those?
> >>
> >> I'm also curious what kinds of improvements you're planning to the
> >> connector you built.
> >>
> >> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <sa...@gmail.com>
> >> wrote:
> >>
> >>> Hi Gian,
> >>>
> >>> I contributed the jdbc based presto-druid connector in prestosql which
> >>> went
> >>> out in release 337
> >>> https://prestosql.io/docs/current/release/release-337.html. The v1
> >>> version
> >>> of the connector doesn’t support aggregate push down yet. It is being
> >>> actively worked on and we expect it to be improved over the next few
> >>> releases. We are currently evaluating using the presto-druid connector
> in
> >>> our Tableau setup. It would be interesting to see what changes in Druid
> >>> would be needed to support that integration.
> >>>
> >>> Thanks,
> >>> Samarth
> >>>
> >>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org> wrote:
> >>>
> >>>> Hey Druids,
> >>>>
> >>>> I was wondering, is anyone on this list using Druid + Presto together?
> >>> If
> >>>> so, what does your architecture look like and which edition / flavor
> of
> >>>> Presto and Druid connector are you using? What's your experience been
> >>> like?
> >>>> I'm asking since I'm starting to think about whether it makes sense to
> >>> look
> >>>> at ways to improve the integration between the two projects.
> >>>>
> >>>> Gian
> >>>>
> >>>
> >>
>
>

Re: Druid + Presto?

Posted by Mainak Ghosh <mg...@twitter.com.INVALID>.

Hello Gian,

We are currently testing the (other) Presto Druid connector at our end. It has aggregation push down support. Adding Zhenxiao to this thread since he is the primary developer of the connector. He can provide the kind of details you are looking for.

Thanks,
Mainak 

> On Jul 9, 2020, at 12:25 PM, Gian Merlino <gi...@apache.org> wrote:
> 
> By the way, I see that the other Presto has a Druid connector too:
> https://prestodb.io/docs/current/connector/druid.html. From the docs it
> looks like it has different lineage and might even work differently.
> 
> On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org> wrote:
> 
>> I was thinking of exploring ideas like pushing down aggregations, enabling
>> Presto to query directly from deep storage (in cases where there aren't any
>> interesting things to push down, this may be more efficient than querying
>> Druid servers), enabling translation from Druid's SQL dialect to Presto's
>> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on this
>> list) have any thoughts on any of those?
>> 
>> I'm also curious what kinds of improvements you're planning to the
>> connector you built.
>> 
>> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <sa...@gmail.com>
>> wrote:
>> 
>>> Hi Gian,
>>> 
>>> I contributed the jdbc based presto-druid connector in prestosql which
>>> went
>>> out in release 337
>>> https://prestosql.io/docs/current/release/release-337.html. The v1
>>> version
>>> of the connector doesn’t support aggregate push down yet. It is being
>>> actively worked on and we expect it to be improved over the next few
>>> releases. We are currently evaluating using the presto-druid connector in
>>> our Tableau setup. It would be interesting to see what changes in Druid
>>> would be needed to support that integration.
>>> 
>>> Thanks,
>>> Samarth
>>> 
>>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org> wrote:
>>> 
>>>> Hey Druids,
>>>> 
>>>> I was wondering, is anyone on this list using Druid + Presto together?
>>> If
>>>> so, what does your architecture look like and which edition / flavor of
>>>> Presto and Druid connector are you using? What's your experience been
>>> like?
>>>> I'm asking since I'm starting to think about whether it makes sense to
>>> look
>>>> at ways to improve the integration between the two projects.
>>>> 
>>>> Gian
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org

Re: Druid + Presto?

Posted by Hao Luo <hl...@twitter.com.INVALID>.

I wrote the Druid connector that reads directly from deep storage(the one
you found in prestodb). It understands the segment file format and read
only the data needed for the query. I think its possible to have
aggregation pushdown(and even more possibilities with other complex
operation pushdown), but more work needs to be done on the Presto side to
take advantage of the indexes in the segments.

On Thu, Jul 9, 2020 at 12:25 PM Gian Merlino <gi...@apache.org> wrote:

> By the way, I see that the other Presto has a Druid connector too:
> https://prestodb.io/docs/current/connector/druid.html. From the docs it
> looks like it has different lineage and might even work differently.
>
> On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org> wrote:
>
> > I was thinking of exploring ideas like pushing down aggregations,
> enabling
> > Presto to query directly from deep storage (in cases where there aren't
> any
> > interesting things to push down, this may be more efficient than querying
> > Druid servers), enabling translation from Druid's SQL dialect to Presto's
> > SQL dialect (a "reverse connector"), etc. Do you (or anyone else on this
> > list) have any thoughts on any of those?
> >
> > I'm also curious what kinds of improvements you're planning to the
> > connector you built.
> >
> > On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <sa...@gmail.com>
> > wrote:
> >
> >> Hi Gian,
> >>
> >> I contributed the jdbc based presto-druid connector in prestosql which
> >> went
> >> out in release 337
> >> https://prestosql.io/docs/current/release/release-337.html. The v1
> >> version
> >> of the connector doesn’t support aggregate push down yet. It is being
> >> actively worked on and we expect it to be improved over the next few
> >> releases. We are currently evaluating using the presto-druid connector
> in
> >> our Tableau setup. It would be interesting to see what changes in Druid
> >> would be needed to support that integration.
> >>
> >> Thanks,
> >> Samarth
> >>
> >> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org> wrote:
> >>
> >> > Hey Druids,
> >> >
> >> > I was wondering, is anyone on this list using Druid + Presto together?
> >> If
> >> > so, what does your architecture look like and which edition / flavor
> of
> >> > Presto and Druid connector are you using? What's your experience been
> >> like?
> >> > I'm asking since I'm starting to think about whether it makes sense to
> >> look
> >> > at ways to improve the integration between the two projects.
> >> >
> >> > Gian
> >> >
> >>
> >
>

Re: Druid + Presto?

Posted by Gian Merlino <gi...@apache.org>.

By the way, I see that the other Presto has a Druid connector too:
https://prestodb.io/docs/current/connector/druid.html. From the docs it
looks like it has different lineage and might even work differently.

On Thu, Jul 9, 2020 at 12:22 PM Gian Merlino <gi...@apache.org> wrote:

> I was thinking of exploring ideas like pushing down aggregations, enabling
> Presto to query directly from deep storage (in cases where there aren't any
> interesting things to push down, this may be more efficient than querying
> Druid servers), enabling translation from Druid's SQL dialect to Presto's
> SQL dialect (a "reverse connector"), etc. Do you (or anyone else on this
> list) have any thoughts on any of those?
>
> I'm also curious what kinds of improvements you're planning to the
> connector you built.
>
> On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <sa...@gmail.com>
> wrote:
>
>> Hi Gian,
>>
>> I contributed the jdbc based presto-druid connector in prestosql which
>> went
>> out in release 337
>> https://prestosql.io/docs/current/release/release-337.html. The v1
>> version
>> of the connector doesn’t support aggregate push down yet. It is being
>> actively worked on and we expect it to be improved over the next few
>> releases. We are currently evaluating using the presto-druid connector in
>> our Tableau setup. It would be interesting to see what changes in Druid
>> would be needed to support that integration.
>>
>> Thanks,
>> Samarth
>>
>> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org> wrote:
>>
>> > Hey Druids,
>> >
>> > I was wondering, is anyone on this list using Druid + Presto together?
>> If
>> > so, what does your architecture look like and which edition / flavor of
>> > Presto and Druid connector are you using? What's your experience been
>> like?
>> > I'm asking since I'm starting to think about whether it makes sense to
>> look
>> > at ways to improve the integration between the two projects.
>> >
>> > Gian
>> >
>>
>

Re: Druid + Presto?

Posted by Gian Merlino <gi...@apache.org>.

I was thinking of exploring ideas like pushing down aggregations, enabling
Presto to query directly from deep storage (in cases where there aren't any
interesting things to push down, this may be more efficient than querying
Druid servers), enabling translation from Druid's SQL dialect to Presto's
SQL dialect (a "reverse connector"), etc. Do you (or anyone else on this
list) have any thoughts on any of those?

I'm also curious what kinds of improvements you're planning to the
connector you built.

On Thu, Jul 9, 2020 at 10:18 AM Samarth Jain <sa...@gmail.com> wrote:

> Hi Gian,
>
> I contributed the jdbc based presto-druid connector in prestosql which went
> out in release 337
> https://prestosql.io/docs/current/release/release-337.html. The v1 version
> of the connector doesn’t support aggregate push down yet. It is being
> actively worked on and we expect it to be improved over the next few
> releases. We are currently evaluating using the presto-druid connector in
> our Tableau setup. It would be interesting to see what changes in Druid
> would be needed to support that integration.
>
> Thanks,
> Samarth
>
> On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org> wrote:
>
> > Hey Druids,
> >
> > I was wondering, is anyone on this list using Druid + Presto together? If
> > so, what does your architecture look like and which edition / flavor of
> > Presto and Druid connector are you using? What's your experience been
> like?
> > I'm asking since I'm starting to think about whether it makes sense to
> look
> > at ways to improve the integration between the two projects.
> >
> > Gian
> >
>

Re: Druid + Presto?

Posted by Samarth Jain <sa...@gmail.com>.

Hi Gian,

I contributed the jdbc based presto-druid connector in prestosql which went
out in release 337
https://prestosql.io/docs/current/release/release-337.html. The v1 version
of the connector doesn’t support aggregate push down yet. It is being
actively worked on and we expect it to be improved over the next few
releases. We are currently evaluating using the presto-druid connector in
our Tableau setup. It would be interesting to see what changes in Druid
would be needed to support that integration.

Thanks,
Samarth

On Thu, Jul 9, 2020 at 10:07 AM Gian Merlino <gi...@apache.org> wrote:

> Hey Druids,
>
> I was wondering, is anyone on this list using Druid + Presto together? If
> so, what does your architecture look like and which edition / flavor of
> Presto and Druid connector are you using? What's your experience been like?
> I'm asking since I'm starting to think about whether it makes sense to look
> at ways to improve the integration between the two projects.
>
> Gian
>