You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Jesus Camacho Rodriguez <jc...@apache.org> on 2017/11/06 17:00:26 UTC

[DISCUSS] The state of the project - 2017

It has been a bit over two years since Calcite graduated to a top-level Apache project [1]. Back then, it was decided that every year there would be a "state of the project" discussion and a new PMC chair/VP would be chosen [2]. The time has come :)

The adoption of Calcite has continued growing nicely during the last year. We continued improving the support to query all data, from semi-structured to streaming, including spatial/geographical/geometry data recently. Calcite can interact with more systems than ever before and we count already more than 12 different adapters into our codebase. In turn, the wide adoption of Calcite is helping us to consolidate existing core code and extend the tests coverage.

The dissemination of the project continued over the last year, with important presence of Calcite in talks at conferences and meetups. In addition, some members of the community are trying (for the second time :) ) to produce a paper describing the project, its architecture, and how other different systems are using it [3]. There were several discussions last year about the difficulty to consume Calcite documentation; we hope that this document would serve as an initial formal reference for the project.

We also continued with a regular release cadence, which is representative of the health of the project as well as useful for the rest of the projects that consume the Calcite bits. Last week, CALCITE-2027 [4] was logged to drop support for Java7. I think it is a great opportunity for the project to take another step forward, releasing Calcite 2.0 shortly after that, and deprecating some old APIs along the way.

We have a larger, more diverse number of committers and contributors than last year, coming both from industry and academia. Their contributions were not limited to code for the project, as we had different members of the community playing the release manager role, spending time improving the documentation of the project, etc.
Probably we still need to improve some aspects as a community. For instance, recently there were discussions about the participation of the community members in one of the important tasks for the project: pull requests reviews. This continuous engagement seems to be challenging for a project such as Calcite, as most of us work primarily on other projects that "consume" Calcite and we might spend more time involved in those projects. While this is difficult to change and I do not have any specific idea to improve it, it is important that we do our best to help and ensure that the project development does not stall.

I am not involved in the Avatica effort, but it has been great to see Avatica continue maturing, moving into its own repository and following with its own release cadence. Josh, Julian, if you want to add a few lines about the state of Avatica, that would be great.

Since we agreed to rotate the PMC chair every 12 months, I want to use this thread to start talking about a replacement too. It has been a privilege to be able to serve as Calcite PMC chair during last year, I wish I could have found more time to foster the project further: I truly believe in Calcite vision and its value at the core of the development of open source data management systems and applications.
Which candidates would like to step up? In my opinion, I think Michael Mior, if he is willing to accept, would be a great candidate. He has been engaged with the Calcite community in different roles, writing code and documentation, reviewing PRs, answering questions in the mailing lists, and acting as release manager for 1.14, among others.

Lastly, as Julian asked last year:
1) What else are we doing well in the project?
2) What are the areas where we need to do better?
Please take some time to share your thoughts about the state of the project.

-Jesús

[1] http://calcite.apache.org/news/2015/10/22/calcite-graduates/

[2] http://mail-archives.apache.org/mod_mbox/incubator-calcite-dev/201509.mbox/%3CCF8D6F96-706F-4502-B41D-0689E357209D%40apache.org%3E

[3] http://issues.apache.org/jira/browse/CALCITE-2024

[4] http://issues.apache.org/jira/browse/CALCITE-2027

Re: [DISCUSS] The state of the project - 2017

Posted by Riccardo Tommasini <ri...@polimi.it>.

Hello all,
My name is Riccardo Tommasini (Politecnico di Milano)
It’s nice to see all these activities and I’m glad that Calcite is finally attracting other people in academia.

On my side,
In collaboration with the university of Bolzano/Bozen, we are using Calcite for ontology based data access (over streams) since one year and a half.

We are planning 2 publications next spring
We are currently running the evaluations.
Honestly I cannot commit to coordinate the area, but i will share here any result.

Native SPARQL SUPPORT is also an idea we are considering, especially in relation without extension to process RDF streams (RSP-QL)

On 7 Nov 2017, 05:25 +0100, Edmon Begoli <eb...@gmail.com>, wrote:
Couple of things to add:

1) Related to academic or just a general research perspective - I volunteer
to coordinate this area, hopefully together with you Michael, Jesus, and
whoever else is interested. Calcite is a great platform to enable this
(modularity, expansiveness, etc.)
Since you guys are already busy with the PMC roles, I can take a lead, and
we can rotate yearly. In terms of output, I propose that we aim for 1-2
strong papers out every year (SIGMOD, VLDB, PODS, etc.), which will
accumulate to a nice body of work. We could try to publish some of these in
collaboration with downstream projects such as Flink, etc.

2) I would also like to propose a new component to Calcite that could
evolve over the next 12-18 months focused on scientific data, starting with
support for genomic and related storage managers (TileDB, SciDB, SAMTools,
etc.), with specific aim at life and medical sciences. This is, in my
opinion, the next big frontier, in addition to the current geospatial focus.

3) I think we'll get the geospatial stuff done by the end of 2018.

Best,
Edmon

On Mon, Nov 6, 2017 at 7:46 PM, Michael Mior <mm...@uwaterloo.ca> wrote:

Jesús,

I'm happy to step in as PMC for next year if others are comfortable with
that. As far as the answers to your questions, a few thoughts below.

1) I think it's great to see continued growth in new contributors. For such
a widely used project, I've never seen new committers be onboarded so
quickly. It's great to see the scope and diversity of use cases for Calcite
expanding. Although preliminary, things like adding geospatial queries are
opening up a lot of new doors for Calcite and I'm interested to see where
this goes.

2) I'd like to see Calcite get more use in academic research. Hopefully the
paper Edmon is currently leading will contribute to that effort. I think
Calcite can make it much easier to prototype query optimizations that
poking around the internals of Postgres or MySQL. (Disclaimer: It's been a
while since I've looked at either of these projects.)

Also, it would be nice to have the CI process become more stable. Whether
this is some improvements to the current Jenkins infrastructure or the work
Christian is doing on getting things running smoothly on Travis CI.
Furthermore, I don't think the integration tests are run as often as they
should be since it can be a little onerous to set up. I've mentioned before
I think Docker could be a better fit than the current VM solution,
especially if we're able to have separate containers for each service so
they can easily be tested individually.

Although my answer to question 2 was a bit more verbose, that shouldn't be
interpreted negatively. Although I haven't been involved with Calcite for
very long, I've been impressed with the upward trajectory and I'm sure that
will continue!

Cheers,
--
Michael Mior
mmior@apache.org

2017-11-06 12:00 GMT-05:00 Jesus Camacho Rodriguez <jc...@apache.org>:

It has been a bit over two years since Calcite graduated to a top-level
Apache project [1]. Back then, it was decided that every year there would
be a "state of the project" discussion and a new PMC chair/VP would be
chosen [2]. The time has come :)

The adoption of Calcite has continued growing nicely during the last
year.
We continued improving the support to query all data, from
semi-structured
to streaming, including spatial/geographical/geometry data recently.
Calcite can interact with more systems than ever before and we count
already more than 12 different adapters into our codebase. In turn, the
wide adoption of Calcite is helping us to consolidate existing core code
and extend the tests coverage.

The dissemination of the project continued over the last year, with
important presence of Calcite in talks at conferences and meetups. In
addition, some members of the community are trying (for the second time
:)
) to produce a paper describing the project, its architecture, and how
other different systems are using it [3]. There were several discussions
last year about the difficulty to consume Calcite documentation; we hope
that this document would serve as an initial formal reference for the
project.

We also continued with a regular release cadence, which is representative
of the health of the project as well as useful for the rest of the
projects
that consume the Calcite bits. Last week, CALCITE-2027 [4] was logged to
drop support for Java7. I think it is a great opportunity for the project
to take another step forward, releasing Calcite 2.0 shortly after that,
and
deprecating some old APIs along the way.

We have a larger, more diverse number of committers and contributors than
last year, coming both from industry and academia. Their contributions
were
not limited to code for the project, as we had different members of the
community playing the release manager role, spending time improving the
documentation of the project, etc.
Probably we still need to improve some aspects as a community. For
instance, recently there were discussions about the participation of the
community members in one of the important tasks for the project: pull
requests reviews. This continuous engagement seems to be challenging for
a
project such as Calcite, as most of us work primarily on other projects
that "consume" Calcite and we might spend more time involved in those
projects. While this is difficult to change and I do not have any
specific
idea to improve it, it is important that we do our best to help and
ensure
that the project development does not stall.

I am not involved in the Avatica effort, but it has been great to see
Avatica continue maturing, moving into its own repository and following
with its own release cadence. Josh, Julian, if you want to add a few
lines
about the state of Avatica, that would be great.

Since we agreed to rotate the PMC chair every 12 months, I want to use
this thread to start talking about a replacement too. It has been a
privilege to be able to serve as Calcite PMC chair during last year, I
wish
I could have found more time to foster the project further: I truly
believe
in Calcite vision and its value at the core of the development of open
source data management systems and applications.
Which candidates would like to step up? In my opinion, I think Michael
Mior, if he is willing to accept, would be a great candidate. He has been
engaged with the Calcite community in different roles, writing code and
documentation, reviewing PRs, answering questions in the mailing lists,
and
acting as release manager for 1.14, among others.

-Jesús

[1] http://calcite.apache.org/news/2015/10/22/calcite-graduates/

[2] http://mail-archives.apache.org/mod_mbox/incubator-
calcite-dev/201509.mbox/%3CCF8D6F96-706F-4502-B41D-
0689E357209D%40apache.org%3E

[3] http://issues.apache.org/jira/browse/CALCITE-2024

[4] http://issues.apache.org/jira/browse/CALCITE-2027

Re: [DISCUSS] The state of the project - 2017

Posted by Edmon Begoli <eb...@gmail.com>.

Couple of things to add:

1) Related to academic or just a general research perspective - I volunteer
to coordinate this area, hopefully together with you Michael, Jesus, and
whoever else is interested. Calcite is a great platform to enable this
(modularity, expansiveness, etc.)
Since you guys are already busy with the PMC roles, I can take a lead, and
we can rotate yearly. In terms of output, I propose that we aim for 1-2
strong papers out every year (SIGMOD, VLDB, PODS, etc.), which will
accumulate to a nice body of work. We could try to publish some of these in
collaboration with downstream projects such as Flink, etc.

2) I would also like to propose a new component to Calcite that could
evolve over the next 12-18 months focused on scientific data, starting with
support for genomic and related storage managers (TileDB, SciDB, SAMTools,
etc.), with specific aim at life and medical sciences. This is, in my
opinion, the next big frontier, in addition to the current geospatial focus.

3) I think we'll get the geospatial stuff done by the end of 2018.

Best,
Edmon

On Mon, Nov 6, 2017 at 7:46 PM, Michael Mior <mm...@uwaterloo.ca> wrote:

> Jesús,
>
> I'm happy to step in as PMC for next year if others are comfortable with
> that. As far as the answers to your questions, a few thoughts below.
>
> 1) I think it's great to see continued growth in new contributors. For such
> a widely used project, I've never seen new committers be onboarded so
> quickly. It's great to see the scope and diversity of use cases for Calcite
> expanding. Although preliminary, things like adding geospatial queries are
> opening up a lot of new doors for Calcite and I'm interested to see where
> this goes.
>
> 2) I'd like to see Calcite get more use in academic research. Hopefully the
> paper Edmon is currently leading will contribute to that effort. I think
> Calcite can make it much easier to prototype query optimizations that
> poking around the internals of Postgres or MySQL. (Disclaimer: It's been a
> while since I've looked at either of these projects.)
>
> Also, it would be nice to have the CI process become more stable. Whether
> this is some improvements to the current Jenkins infrastructure or the work
> Christian is doing on getting things running smoothly on Travis CI.
> Furthermore, I don't think the integration tests are run as often as they
> should be since it can be a little onerous to set up. I've mentioned before
> I think Docker could be a better fit than the current VM solution,
> especially if we're able to have separate containers for each service so
> they can easily be tested individually.
>
> Although my answer to question 2 was a bit more verbose, that shouldn't be
> interpreted negatively. Although I haven't been involved with Calcite for
> very long, I've been impressed with the upward trajectory and I'm sure that
> will continue!
>
> Cheers,
> --
> Michael Mior
> mmior@apache.org
>
> 2017-11-06 12:00 GMT-05:00 Jesus Camacho Rodriguez <jc...@apache.org>:
>
> > It has been a bit over two years since Calcite graduated to a top-level
> > Apache project [1]. Back then, it was decided that every year there would
> > be a "state of the project" discussion and a new PMC chair/VP would be
> > chosen [2]. The time has come :)
> >
> > The adoption of Calcite has continued growing nicely during the last
> year.
> > We continued improving the support to query all data, from
> semi-structured
> > to streaming, including spatial/geographical/geometry data recently.
> > Calcite can interact with more systems than ever before and we count
> > already more than 12 different adapters into our codebase. In turn, the
> > wide adoption of Calcite is helping us to consolidate existing core code
> > and extend the tests coverage.
> >
> > The dissemination of the project continued over the last year, with
> > important presence of Calcite in talks at conferences and meetups. In
> > addition, some members of the community are trying (for the second time
> :)
> > ) to produce a paper describing the project, its architecture, and how
> > other different systems are using it [3]. There were several discussions
> > last year about the difficulty to consume Calcite documentation; we hope
> > that this document would serve as an initial formal reference for the
> > project.
> >
> > We also continued with a regular release cadence, which is representative
> > of the health of the project as well as useful for the rest of the
> projects
> > that consume the Calcite bits. Last week, CALCITE-2027 [4] was logged to
> > drop support for Java7. I think it is a great opportunity for the project
> > to take another step forward, releasing Calcite 2.0 shortly after that,
> and
> > deprecating some old APIs along the way.
> >
> > We have a larger, more diverse number of committers and contributors than
> > last year, coming both from industry and academia. Their contributions
> were
> > not limited to code for the project, as we had different members of the
> > community playing the release manager role, spending time improving the
> > documentation of the project, etc.
> > Probably we still need to improve some aspects as a community. For
> > instance, recently there were discussions about the participation of the
> > community members in one of the important tasks for the project: pull
> > requests reviews. This continuous engagement seems to be challenging for
> a
> > project such as Calcite, as most of us work primarily on other projects
> > that "consume" Calcite and we might spend more time involved in those
> > projects. While this is difficult to change and I do not have any
> specific
> > idea to improve it, it is important that we do our best to help and
> ensure
> > that the project development does not stall.
> >
> > I am not involved in the Avatica effort, but it has been great to see
> > Avatica continue maturing, moving into its own repository and following
> > with its own release cadence. Josh, Julian, if you want to add a few
> lines
> > about the state of Avatica, that would be great.
> >
> > Since we agreed to rotate the PMC chair every 12 months, I want to use
> > this thread to start talking about a replacement too. It has been a
> > privilege to be able to serve as Calcite PMC chair during last year, I
> wish
> > I could have found more time to foster the project further: I truly
> believe
> > in Calcite vision and its value at the core of the development of open
> > source data management systems and applications.
> > Which candidates would like to step up? In my opinion, I think Michael
> > Mior, if he is willing to accept, would be a great candidate. He has been
> > engaged with the Calcite community in different roles, writing code and
> > documentation, reviewing PRs, answering questions in the mailing lists,
> and
> > acting as release manager for 1.14, among others.
> >
> > Lastly, as Julian asked last year:
> > 1) What else are we doing well in the project?
> > 2) What are the areas where we need to do better?
> > Please take some time to share your thoughts about the state of the
> > project.
> >
> > -Jesús
> >
> >
> >
> > [1] http://calcite.apache.org/news/2015/10/22/calcite-graduates/
> >
> > [2] http://mail-archives.apache.org/mod_mbox/incubator-
> > calcite-dev/201509.mbox/%3CCF8D6F96-706F-4502-B41D-
> > 0689E357209D%40apache.org%3E
> >
> > [3] http://issues.apache.org/jira/browse/CALCITE-2024
> >
> > [4] http://issues.apache.org/jira/browse/CALCITE-2027
> >
> >
> >
> >
>

Re: [DISCUSS] The state of the project - 2017

Posted by Michael Mior <mm...@uwaterloo.ca>.

Jesús,

I'm happy to step in as PMC for next year if others are comfortable with
that. As far as the answers to your questions, a few thoughts below.

1) I think it's great to see continued growth in new contributors. For such
a widely used project, I've never seen new committers be onboarded so
quickly. It's great to see the scope and diversity of use cases for Calcite
expanding. Although preliminary, things like adding geospatial queries are
opening up a lot of new doors for Calcite and I'm interested to see where
this goes.

2) I'd like to see Calcite get more use in academic research. Hopefully the
paper Edmon is currently leading will contribute to that effort. I think
Calcite can make it much easier to prototype query optimizations that
poking around the internals of Postgres or MySQL. (Disclaimer: It's been a
while since I've looked at either of these projects.)

Also, it would be nice to have the CI process become more stable. Whether
this is some improvements to the current Jenkins infrastructure or the work
Christian is doing on getting things running smoothly on Travis CI.
Furthermore, I don't think the integration tests are run as often as they
should be since it can be a little onerous to set up. I've mentioned before
I think Docker could be a better fit than the current VM solution,
especially if we're able to have separate containers for each service so
they can easily be tested individually.

Although my answer to question 2 was a bit more verbose, that shouldn't be
interpreted negatively. Although I haven't been involved with Calcite for
very long, I've been impressed with the upward trajectory and I'm sure that
will continue!

Cheers,
--
Michael Mior
mmior@apache.org

2017-11-06 12:00 GMT-05:00 Jesus Camacho Rodriguez <jc...@apache.org>:

> It has been a bit over two years since Calcite graduated to a top-level
> Apache project [1]. Back then, it was decided that every year there would
> be a "state of the project" discussion and a new PMC chair/VP would be
> chosen [2]. The time has come :)
>
> The adoption of Calcite has continued growing nicely during the last year.
> We continued improving the support to query all data, from semi-structured
> to streaming, including spatial/geographical/geometry data recently.
> Calcite can interact with more systems than ever before and we count
> already more than 12 different adapters into our codebase. In turn, the
> wide adoption of Calcite is helping us to consolidate existing core code
> and extend the tests coverage.
>
> The dissemination of the project continued over the last year, with
> important presence of Calcite in talks at conferences and meetups. In
> addition, some members of the community are trying (for the second time :)
> ) to produce a paper describing the project, its architecture, and how
> other different systems are using it [3]. There were several discussions
> last year about the difficulty to consume Calcite documentation; we hope
> that this document would serve as an initial formal reference for the
> project.
>
> We also continued with a regular release cadence, which is representative
> of the health of the project as well as useful for the rest of the projects
> that consume the Calcite bits. Last week, CALCITE-2027 [4] was logged to
> drop support for Java7. I think it is a great opportunity for the project
> to take another step forward, releasing Calcite 2.0 shortly after that, and
> deprecating some old APIs along the way.
>
> We have a larger, more diverse number of committers and contributors than
> last year, coming both from industry and academia. Their contributions were
> not limited to code for the project, as we had different members of the
> community playing the release manager role, spending time improving the
> documentation of the project, etc.
> Probably we still need to improve some aspects as a community. For
> instance, recently there were discussions about the participation of the
> community members in one of the important tasks for the project: pull
> requests reviews. This continuous engagement seems to be challenging for a
> project such as Calcite, as most of us work primarily on other projects
> that "consume" Calcite and we might spend more time involved in those
> projects. While this is difficult to change and I do not have any specific
> idea to improve it, it is important that we do our best to help and ensure
> that the project development does not stall.
>
> I am not involved in the Avatica effort, but it has been great to see
> Avatica continue maturing, moving into its own repository and following
> with its own release cadence. Josh, Julian, if you want to add a few lines
> about the state of Avatica, that would be great.
>
> Since we agreed to rotate the PMC chair every 12 months, I want to use
> this thread to start talking about a replacement too. It has been a
> privilege to be able to serve as Calcite PMC chair during last year, I wish
> I could have found more time to foster the project further: I truly believe
> in Calcite vision and its value at the core of the development of open
> source data management systems and applications.
> Which candidates would like to step up? In my opinion, I think Michael
> Mior, if he is willing to accept, would be a great candidate. He has been
> engaged with the Calcite community in different roles, writing code and
> documentation, reviewing PRs, answering questions in the mailing lists, and
> acting as release manager for 1.14, among others.
>
> Lastly, as Julian asked last year:
> 1) What else are we doing well in the project?
> 2) What are the areas where we need to do better?
> Please take some time to share your thoughts about the state of the
> project.
>
> -Jesús
>
>
>
> [1] http://calcite.apache.org/news/2015/10/22/calcite-graduates/
>
> [2] http://mail-archives.apache.org/mod_mbox/incubator-
> calcite-dev/201509.mbox/%3CCF8D6F96-706F-4502-B41D-
> 0689E357209D%40apache.org%3E
>
> [3] http://issues.apache.org/jira/browse/CALCITE-2024
>
> [4] http://issues.apache.org/jira/browse/CALCITE-2027
>
>
>
>

Re: [DISCUSS] The state of the project - 2017

Posted by Edmon Begoli <eb...@gmail.com>.

Why don’t we maybe open a separate thread, and have people volunteer for
the roles (and propose them) that could play on the project.

Related to Julian’s previous note, and related to Calcite’s academic and
research community users — great idea.

I could clearly see Calcite being a platform for innovation and ongoing
research as an open, modular query processing platform.

SQL is still well, and alive, but I could see Calcite emerging as a
polyglot query processor, and a front-end for non-relational, but
structures data, and for relational/non-relational querying scenarios.
Plenty of room for advancements there.

Which, btw, we (my ORNL team) just got a paper accepted to a Polystore
workshop. Paper is titled “An Apache Calcite-based Polystore Variation for
Federated Querying of Heterogeneous Healthcare Sources”.
We wanted to demonstrate to our colleagues, specially our close
collaborators from the BigDAWG team, that Calcite could be a viable
polyglot query processor for the federated data access from diverse
sources.

I intend to continue research and spreading of the word in this area, with
specific focus on the life sciences and medical data.

On Wed, Nov 8, 2017 at 17:50 Michael Mior <mm...@uwaterloo.ca> wrote:

> Interesting thoughts about the paper you pointed to Julian. I believe I
> read it some time ago, but I'll have to dust it off and think about it in
> the context of Calcite. All your other thoughts also sound like exciting
> directions for Calcite.
>
> I hope we can all find ways to take some of the burden off your shoulders.
> While I am happy to serve as PMC chair, I'm still working on familiarizing
> myself with the code base to the point where I can more quickly review PRs
> with some level of confidence. (For the time being, I'm also not actively
> using Calcite.) I wonder if others would be willing to step up to "own"
> parts of the code base (e.g. as Josh does in many ways with Avatica). I
> think if we could have the majority of components on JIRA assigned by
> default to someone other than you, that might be a start. Of course,
> practically speaking so much is contained within core, that this might have
> marginal impact. We could also consider (on JIRA only) creating some
> additional components to further partition things.
>
> I forgot when I was thinking about CI that you have your own build suite
> running for the project which is much appreciated :) But I'm sure we would
> both agree that it would be nice if this extra testing wasn't resting
> solely on you. I'll start a separate thread when I have time to start
> hacking on CI-related things to get some more input.
>
> --
> Michael Mior
> mmior@apache.org
>
> 2017-11-08 16:34 GMT-05:00 Julian Hyde <jh...@apache.org>:
>
> > Thanks for starting this discussion, Jesus. Here are some thoughts, in
> > no particular order.
> >
> > I too have noticed the increase in academic adoption. This is
> > excellent. Shall we add a section to the "Powered by" page [1] on
> > academic projects and papers?
> >
> > I worry a lot about audience (or audiences). Who is using Calcite? Are
> > we giving them what they need? Data engines (such as Drill, Hive and
> > Flink) are one category, and I think they are fairly well served.
> > Academics are another audience; some are succeeding, but I wonder
> > whether it would be easier for them if we had some relevant examples,
> > such as how to parse a query and optimize it using several different
> > cost models and combinations of rules. What other audiences are there?
> >
> > There is an audience who would like to use Calcite as a standalone
> > engine; and folks who would like to incorporate materialized views,
> > indexes and constraints into their engine but prefer to speak SQL
> > rather than Java APIs. Those groups are not well served today. I am
> > working on a server which has DDL support[2][3]; it would provide a
> > (simple) standalone engine, but also allow us to demo materialized
> > views, virtual columns, check constraints and foreign tables/schemas
> > via SQL so that people building engines can more easily grasp the
> > concepts.
> >
> > I read Trumer & Koch's paper "Multi-objective parametric query
> > optimization" [4] in CACM recently. It is a very exciting advance, and
> > too much to cover in this thread, but it got me thinking about how
> > Calcite could evolve to incorporate their ideas. I realize that giving
> > RelOptCost multiple fields was a mistake, unless we also add the
> > mechanics (piecewise-linear cost functions and polytopes) to handle
> > them. The vast majority of Calcite remains applicable, so this would
> > be evolutionary: Calcite's rules and algebra emerge intact in the new
> > order, and Calcite's metadata framework can model the new cost
> > functions. Extending Calcite could raise some interesting research
> > topics; is it possible to extend the parameter space (either the
> > number of parameters or the value range of those parameters) after
> > initiial planning?; can we use parameters to model whether
> > intermediate results are materialized (see [5]) or whether ephemeral
> > materialized views happen to be present in cache?; what new statistics
> > do we need to gather to power the new cost functions? There is enough
> > here to interest several researchers.
> >
> > As for features:
> > * I would like to get to full compliance with OpenGIS, because spatial
> > support is much more straightforward in Calcite's algebraic approach
> > than in engines which need to build a new data structure.
> > * I also would like to give users a choice of engines in Calcite:
> > Spark and perhaps something based on Arrow, in addition to the
> > existing Enumerable engine.
> > * I would like to continue to make the planner more modular, so that
> > people can supply a program (a collection of rules organized into
> > planning phases) and basically just say "go".
> > * And I plan to continue my work to make data systems learn and adapt,
> > creating and populating materialized views based on observed query
> > patterns and data statistics.
> >
> > Regarding governance. I think we are functioning well as a
> > meritocratic community. High-quality contributions arrive from people
> > who have never contributed before; this is happening more and more
> > frequently, which is really excellent. On the other hand, this
> > increases the load for reviewing (and pro-actively fixing)
> > contributions, and too much of that work still falls on my shoulders.
> > There are times when I get close to burn out, especially when people
> > explicitly direct questions and pull requests at me.
> >
> > I think Michael would be an excellent PMC chair. I am delighted that
> > he is prepared to do the job.
> >
> > Regarding CI. There is a bit more CI going on than meets the eye; I
> > run several tests nightly on my home server, and also on a Windows VM,
> > and speak up if things get broken. But I admit there has been bit-rot
> > in some of the adapters, and having a public CI for those adapters
> > would be useful, if we can do so without generating too much noise.
> >
> > Julian
> >
> > [1] https://calcite.apache.org/docs/powered_by.html
> >
> > [2] https://issues.apache.org/jira/browse/CALCITE-707
> >
> > [3] https://issues.apache.org/jira/browse/CALCITE-1991
> >
> > [4] https://cacm.acm.org/magazines/2017/10/221322-
> > multi-objective-parametric-query-optimization/abstract
> >
> > [5] https://issues.apache.org/jira/browse/CALCITE-481
> >
> > On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <el...@apache.org> wrote:
> > > On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
> > >>
> > >> I am not involved in the Avatica effort, but it has been great to see
> > >> Avatica continue maturing, moving into its own repository and
> following
> > with
> > >> its own release cadence. Josh, Julian, if you want to add a few lines
> > about
> > >> the state of Avatica, that would be great.
> > >
> > >
> > > Would be happy to :)
> > >
> > > I've certainly been spending less time on core-functionality. Avatica
> has
> > > definitely passed the cusp for what most developers need. The majority
> of
> > > users would find Avatica to be fully-featured as a JDBC interface (but
> > there
> > > are some gaps that still exist).
> > >
> > > We've started to see the focus on non-JDBC drivers for Avatica which
> is a
> > > great sign. Our Francis has been making progress on trying to adopt the
> > > driver written in Go into the Apache codebase. There are a few other
> > drivers
> > > available as well. The presence of these drivers, and their ability to
> > > continue to function is good validation of the protocol/stability model
> > that
> > > we outlined/implemented in the past 1-2 years.
> > >
> > > Avatica is still fairly low-volume, with only a few people
> contributing.
> > I'd
> > > love to see more people take an interest (it's a great stepping stone
> > into
> > > Calcite too ;P).
> >
>

Re: [DISCUSS] The state of the project - 2017

Posted by Julian Hyde <jh...@apache.org>.

I agree. Let’s make it actionable: create a JIRA case, and to complete the task we should add the list of components and their owners in one of the web pages.

We should make the list of components should line up with the components in JIRA. I don’t think it’s very important how we slice up the project into components — components do not always correspond to a particular java package or piece of code, but more often to an area of functionality.

Julian


> On Nov 9, 2017, at 6:30 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
> Michael,
> 
> I think the ownership thinking is a really good idea. Things like trait
> behaviors, volcano, types, hep, decorrelation, parsing, sql-to-rel,
> materialized views are all good chunks that could be owned by someone. (in
> addition to avatica and each of the connectors).
> 
> On Wed, Nov 8, 2017 at 2:50 PM, Michael Mior <mm...@uwaterloo.ca> wrote:
> 
>> Interesting thoughts about the paper you pointed to Julian. I believe I
>> read it some time ago, but I'll have to dust it off and think about it in
>> the context of Calcite. All your other thoughts also sound like exciting
>> directions for Calcite.
>> 
>> I hope we can all find ways to take some of the burden off your shoulders.
>> While I am happy to serve as PMC chair, I'm still working on familiarizing
>> myself with the code base to the point where I can more quickly review PRs
>> with some level of confidence. (For the time being, I'm also not actively
>> using Calcite.) I wonder if others would be willing to step up to "own"
>> parts of the code base (e.g. as Josh does in many ways with Avatica). I
>> think if we could have the majority of components on JIRA assigned by
>> default to someone other than you, that might be a start. Of course,
>> practically speaking so much is contained within core, that this might have
>> marginal impact. We could also consider (on JIRA only) creating some
>> additional components to further partition things.
>> 
>> I forgot when I was thinking about CI that you have your own build suite
>> running for the project which is much appreciated :) But I'm sure we would
>> both agree that it would be nice if this extra testing wasn't resting
>> solely on you. I'll start a separate thread when I have time to start
>> hacking on CI-related things to get some more input.
>> 
>> --
>> Michael Mior
>> mmior@apache.org
>> 
>> 2017-11-08 16:34 GMT-05:00 Julian Hyde <jh...@apache.org>:
>> 
>>> Thanks for starting this discussion, Jesus. Here are some thoughts, in
>>> no particular order.
>>> 
>>> I too have noticed the increase in academic adoption. This is
>>> excellent. Shall we add a section to the "Powered by" page [1] on
>>> academic projects and papers?
>>> 
>>> I worry a lot about audience (or audiences). Who is using Calcite? Are
>>> we giving them what they need? Data engines (such as Drill, Hive and
>>> Flink) are one category, and I think they are fairly well served.
>>> Academics are another audience; some are succeeding, but I wonder
>>> whether it would be easier for them if we had some relevant examples,
>>> such as how to parse a query and optimize it using several different
>>> cost models and combinations of rules. What other audiences are there?
>>> 
>>> There is an audience who would like to use Calcite as a standalone
>>> engine; and folks who would like to incorporate materialized views,
>>> indexes and constraints into their engine but prefer to speak SQL
>>> rather than Java APIs. Those groups are not well served today. I am
>>> working on a server which has DDL support[2][3]; it would provide a
>>> (simple) standalone engine, but also allow us to demo materialized
>>> views, virtual columns, check constraints and foreign tables/schemas
>>> via SQL so that people building engines can more easily grasp the
>>> concepts.
>>> 
>>> I read Trumer & Koch's paper "Multi-objective parametric query
>>> optimization" [4] in CACM recently. It is a very exciting advance, and
>>> too much to cover in this thread, but it got me thinking about how
>>> Calcite could evolve to incorporate their ideas. I realize that giving
>>> RelOptCost multiple fields was a mistake, unless we also add the
>>> mechanics (piecewise-linear cost functions and polytopes) to handle
>>> them. The vast majority of Calcite remains applicable, so this would
>>> be evolutionary: Calcite's rules and algebra emerge intact in the new
>>> order, and Calcite's metadata framework can model the new cost
>>> functions. Extending Calcite could raise some interesting research
>>> topics; is it possible to extend the parameter space (either the
>>> number of parameters or the value range of those parameters) after
>>> initiial planning?; can we use parameters to model whether
>>> intermediate results are materialized (see [5]) or whether ephemeral
>>> materialized views happen to be present in cache?; what new statistics
>>> do we need to gather to power the new cost functions? There is enough
>>> here to interest several researchers.
>>> 
>>> As for features:
>>> * I would like to get to full compliance with OpenGIS, because spatial
>>> support is much more straightforward in Calcite's algebraic approach
>>> than in engines which need to build a new data structure.
>>> * I also would like to give users a choice of engines in Calcite:
>>> Spark and perhaps something based on Arrow, in addition to the
>>> existing Enumerable engine.
>>> * I would like to continue to make the planner more modular, so that
>>> people can supply a program (a collection of rules organized into
>>> planning phases) and basically just say "go".
>>> * And I plan to continue my work to make data systems learn and adapt,
>>> creating and populating materialized views based on observed query
>>> patterns and data statistics.
>>> 
>>> Regarding governance. I think we are functioning well as a
>>> meritocratic community. High-quality contributions arrive from people
>>> who have never contributed before; this is happening more and more
>>> frequently, which is really excellent. On the other hand, this
>>> increases the load for reviewing (and pro-actively fixing)
>>> contributions, and too much of that work still falls on my shoulders.
>>> There are times when I get close to burn out, especially when people
>>> explicitly direct questions and pull requests at me.
>>> 
>>> I think Michael would be an excellent PMC chair. I am delighted that
>>> he is prepared to do the job.
>>> 
>>> Regarding CI. There is a bit more CI going on than meets the eye; I
>>> run several tests nightly on my home server, and also on a Windows VM,
>>> and speak up if things get broken. But I admit there has been bit-rot
>>> in some of the adapters, and having a public CI for those adapters
>>> would be useful, if we can do so without generating too much noise.
>>> 
>>> Julian
>>> 
>>> [1] https://calcite.apache.org/docs/powered_by.html
>>> 
>>> [2] https://issues.apache.org/jira/browse/CALCITE-707
>>> 
>>> [3] https://issues.apache.org/jira/browse/CALCITE-1991
>>> 
>>> [4] https://cacm.acm.org/magazines/2017/10/221322-
>>> multi-objective-parametric-query-optimization/abstract
>>> 
>>> [5] https://issues.apache.org/jira/browse/CALCITE-481
>>> 
>>> On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <el...@apache.org> wrote:
>>>> On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
>>>>> 
>>>>> I am not involved in the Avatica effort, but it has been great to see
>>>>> Avatica continue maturing, moving into its own repository and
>> following
>>> with
>>>>> its own release cadence. Josh, Julian, if you want to add a few lines
>>> about
>>>>> the state of Avatica, that would be great.
>>>> 
>>>> 
>>>> Would be happy to :)
>>>> 
>>>> I've certainly been spending less time on core-functionality. Avatica
>> has
>>>> definitely passed the cusp for what most developers need. The majority
>> of
>>>> users would find Avatica to be fully-featured as a JDBC interface (but
>>> there
>>>> are some gaps that still exist).
>>>> 
>>>> We've started to see the focus on non-JDBC drivers for Avatica which
>> is a
>>>> great sign. Our Francis has been making progress on trying to adopt the
>>>> driver written in Go into the Apache codebase. There are a few other
>>> drivers
>>>> available as well. The presence of these drivers, and their ability to
>>>> continue to function is good validation of the protocol/stability model
>>> that
>>>> we outlined/implemented in the past 1-2 years.
>>>> 
>>>> Avatica is still fairly low-volume, with only a few people
>> contributing.
>>> I'd
>>>> love to see more people take an interest (it's a great stepping stone
>>> into
>>>> Calcite too ;P).
>>> 
>>

Re: [DISCUSS] The state of the project - 2017

Posted by Jacques Nadeau <ja...@apache.org>.

Michael,

I think the ownership thinking is a really good idea. Things like trait
behaviors, volcano, types, hep, decorrelation, parsing, sql-to-rel,
materialized views are all good chunks that could be owned by someone. (in
addition to avatica and each of the connectors).

On Wed, Nov 8, 2017 at 2:50 PM, Michael Mior <mm...@uwaterloo.ca> wrote:

> Interesting thoughts about the paper you pointed to Julian. I believe I
> read it some time ago, but I'll have to dust it off and think about it in
> the context of Calcite. All your other thoughts also sound like exciting
> directions for Calcite.
>
> I hope we can all find ways to take some of the burden off your shoulders.
> While I am happy to serve as PMC chair, I'm still working on familiarizing
> myself with the code base to the point where I can more quickly review PRs
> with some level of confidence. (For the time being, I'm also not actively
> using Calcite.) I wonder if others would be willing to step up to "own"
> parts of the code base (e.g. as Josh does in many ways with Avatica). I
> think if we could have the majority of components on JIRA assigned by
> default to someone other than you, that might be a start. Of course,
> practically speaking so much is contained within core, that this might have
> marginal impact. We could also consider (on JIRA only) creating some
> additional components to further partition things.
>
> I forgot when I was thinking about CI that you have your own build suite
> running for the project which is much appreciated :) But I'm sure we would
> both agree that it would be nice if this extra testing wasn't resting
> solely on you. I'll start a separate thread when I have time to start
> hacking on CI-related things to get some more input.
>
> --
> Michael Mior
> mmior@apache.org
>
> 2017-11-08 16:34 GMT-05:00 Julian Hyde <jh...@apache.org>:
>
> > Thanks for starting this discussion, Jesus. Here are some thoughts, in
> > no particular order.
> >
> > I too have noticed the increase in academic adoption. This is
> > excellent. Shall we add a section to the "Powered by" page [1] on
> > academic projects and papers?
> >
> > I worry a lot about audience (or audiences). Who is using Calcite? Are
> > we giving them what they need? Data engines (such as Drill, Hive and
> > Flink) are one category, and I think they are fairly well served.
> > Academics are another audience; some are succeeding, but I wonder
> > whether it would be easier for them if we had some relevant examples,
> > such as how to parse a query and optimize it using several different
> > cost models and combinations of rules. What other audiences are there?
> >
> > There is an audience who would like to use Calcite as a standalone
> > engine; and folks who would like to incorporate materialized views,
> > indexes and constraints into their engine but prefer to speak SQL
> > rather than Java APIs. Those groups are not well served today. I am
> > working on a server which has DDL support[2][3]; it would provide a
> > (simple) standalone engine, but also allow us to demo materialized
> > views, virtual columns, check constraints and foreign tables/schemas
> > via SQL so that people building engines can more easily grasp the
> > concepts.
> >
> > I read Trumer & Koch's paper "Multi-objective parametric query
> > optimization" [4] in CACM recently. It is a very exciting advance, and
> > too much to cover in this thread, but it got me thinking about how
> > Calcite could evolve to incorporate their ideas. I realize that giving
> > RelOptCost multiple fields was a mistake, unless we also add the
> > mechanics (piecewise-linear cost functions and polytopes) to handle
> > them. The vast majority of Calcite remains applicable, so this would
> > be evolutionary: Calcite's rules and algebra emerge intact in the new
> > order, and Calcite's metadata framework can model the new cost
> > functions. Extending Calcite could raise some interesting research
> > topics; is it possible to extend the parameter space (either the
> > number of parameters or the value range of those parameters) after
> > initiial planning?; can we use parameters to model whether
> > intermediate results are materialized (see [5]) or whether ephemeral
> > materialized views happen to be present in cache?; what new statistics
> > do we need to gather to power the new cost functions? There is enough
> > here to interest several researchers.
> >
> > As for features:
> > * I would like to get to full compliance with OpenGIS, because spatial
> > support is much more straightforward in Calcite's algebraic approach
> > than in engines which need to build a new data structure.
> > * I also would like to give users a choice of engines in Calcite:
> > Spark and perhaps something based on Arrow, in addition to the
> > existing Enumerable engine.
> > * I would like to continue to make the planner more modular, so that
> > people can supply a program (a collection of rules organized into
> > planning phases) and basically just say "go".
> > * And I plan to continue my work to make data systems learn and adapt,
> > creating and populating materialized views based on observed query
> > patterns and data statistics.
> >
> > Regarding governance. I think we are functioning well as a
> > meritocratic community. High-quality contributions arrive from people
> > who have never contributed before; this is happening more and more
> > frequently, which is really excellent. On the other hand, this
> > increases the load for reviewing (and pro-actively fixing)
> > contributions, and too much of that work still falls on my shoulders.
> > There are times when I get close to burn out, especially when people
> > explicitly direct questions and pull requests at me.
> >
> > I think Michael would be an excellent PMC chair. I am delighted that
> > he is prepared to do the job.
> >
> > Regarding CI. There is a bit more CI going on than meets the eye; I
> > run several tests nightly on my home server, and also on a Windows VM,
> > and speak up if things get broken. But I admit there has been bit-rot
> > in some of the adapters, and having a public CI for those adapters
> > would be useful, if we can do so without generating too much noise.
> >
> > Julian
> >
> > [1] https://calcite.apache.org/docs/powered_by.html
> >
> > [2] https://issues.apache.org/jira/browse/CALCITE-707
> >
> > [3] https://issues.apache.org/jira/browse/CALCITE-1991
> >
> > [4] https://cacm.acm.org/magazines/2017/10/221322-
> > multi-objective-parametric-query-optimization/abstract
> >
> > [5] https://issues.apache.org/jira/browse/CALCITE-481
> >
> > On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <el...@apache.org> wrote:
> > > On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
> > >>
> > >> I am not involved in the Avatica effort, but it has been great to see
> > >> Avatica continue maturing, moving into its own repository and
> following
> > with
> > >> its own release cadence. Josh, Julian, if you want to add a few lines
> > about
> > >> the state of Avatica, that would be great.
> > >
> > >
> > > Would be happy to :)
> > >
> > > I've certainly been spending less time on core-functionality. Avatica
> has
> > > definitely passed the cusp for what most developers need. The majority
> of
> > > users would find Avatica to be fully-featured as a JDBC interface (but
> > there
> > > are some gaps that still exist).
> > >
> > > We've started to see the focus on non-JDBC drivers for Avatica which
> is a
> > > great sign. Our Francis has been making progress on trying to adopt the
> > > driver written in Go into the Apache codebase. There are a few other
> > drivers
> > > available as well. The presence of these drivers, and their ability to
> > > continue to function is good validation of the protocol/stability model
> > that
> > > we outlined/implemented in the past 1-2 years.
> > >
> > > Avatica is still fairly low-volume, with only a few people
> contributing.
> > I'd
> > > love to see more people take an interest (it's a great stepping stone
> > into
> > > Calcite too ;P).
> >
>

Re: [DISCUSS] The state of the project - 2017

Posted by Michael Mior <mm...@uwaterloo.ca>.

Interesting thoughts about the paper you pointed to Julian. I believe I
read it some time ago, but I'll have to dust it off and think about it in
the context of Calcite. All your other thoughts also sound like exciting
directions for Calcite.

I hope we can all find ways to take some of the burden off your shoulders.
While I am happy to serve as PMC chair, I'm still working on familiarizing
myself with the code base to the point where I can more quickly review PRs
with some level of confidence. (For the time being, I'm also not actively
using Calcite.) I wonder if others would be willing to step up to "own"
parts of the code base (e.g. as Josh does in many ways with Avatica). I
think if we could have the majority of components on JIRA assigned by
default to someone other than you, that might be a start. Of course,
practically speaking so much is contained within core, that this might have
marginal impact. We could also consider (on JIRA only) creating some
additional components to further partition things.

I forgot when I was thinking about CI that you have your own build suite
running for the project which is much appreciated :) But I'm sure we would
both agree that it would be nice if this extra testing wasn't resting
solely on you. I'll start a separate thread when I have time to start
hacking on CI-related things to get some more input.

--
Michael Mior
mmior@apache.org

2017-11-08 16:34 GMT-05:00 Julian Hyde <jh...@apache.org>:

> Thanks for starting this discussion, Jesus. Here are some thoughts, in
> no particular order.
>
> I too have noticed the increase in academic adoption. This is
> excellent. Shall we add a section to the "Powered by" page [1] on
> academic projects and papers?
>
> I worry a lot about audience (or audiences). Who is using Calcite? Are
> we giving them what they need? Data engines (such as Drill, Hive and
> Flink) are one category, and I think they are fairly well served.
> Academics are another audience; some are succeeding, but I wonder
> whether it would be easier for them if we had some relevant examples,
> such as how to parse a query and optimize it using several different
> cost models and combinations of rules. What other audiences are there?
>
> There is an audience who would like to use Calcite as a standalone
> engine; and folks who would like to incorporate materialized views,
> indexes and constraints into their engine but prefer to speak SQL
> rather than Java APIs. Those groups are not well served today. I am
> working on a server which has DDL support[2][3]; it would provide a
> (simple) standalone engine, but also allow us to demo materialized
> views, virtual columns, check constraints and foreign tables/schemas
> via SQL so that people building engines can more easily grasp the
> concepts.
>
> I read Trumer & Koch's paper "Multi-objective parametric query
> optimization" [4] in CACM recently. It is a very exciting advance, and
> too much to cover in this thread, but it got me thinking about how
> Calcite could evolve to incorporate their ideas. I realize that giving
> RelOptCost multiple fields was a mistake, unless we also add the
> mechanics (piecewise-linear cost functions and polytopes) to handle
> them. The vast majority of Calcite remains applicable, so this would
> be evolutionary: Calcite's rules and algebra emerge intact in the new
> order, and Calcite's metadata framework can model the new cost
> functions. Extending Calcite could raise some interesting research
> topics; is it possible to extend the parameter space (either the
> number of parameters or the value range of those parameters) after
> initiial planning?; can we use parameters to model whether
> intermediate results are materialized (see [5]) or whether ephemeral
> materialized views happen to be present in cache?; what new statistics
> do we need to gather to power the new cost functions? There is enough
> here to interest several researchers.
>
> As for features:
> * I would like to get to full compliance with OpenGIS, because spatial
> support is much more straightforward in Calcite's algebraic approach
> than in engines which need to build a new data structure.
> * I also would like to give users a choice of engines in Calcite:
> Spark and perhaps something based on Arrow, in addition to the
> existing Enumerable engine.
> * I would like to continue to make the planner more modular, so that
> people can supply a program (a collection of rules organized into
> planning phases) and basically just say "go".
> * And I plan to continue my work to make data systems learn and adapt,
> creating and populating materialized views based on observed query
> patterns and data statistics.
>
> Regarding governance. I think we are functioning well as a
> meritocratic community. High-quality contributions arrive from people
> who have never contributed before; this is happening more and more
> frequently, which is really excellent. On the other hand, this
> increases the load for reviewing (and pro-actively fixing)
> contributions, and too much of that work still falls on my shoulders.
> There are times when I get close to burn out, especially when people
> explicitly direct questions and pull requests at me.
>
> I think Michael would be an excellent PMC chair. I am delighted that
> he is prepared to do the job.
>
> Regarding CI. There is a bit more CI going on than meets the eye; I
> run several tests nightly on my home server, and also on a Windows VM,
> and speak up if things get broken. But I admit there has been bit-rot
> in some of the adapters, and having a public CI for those adapters
> would be useful, if we can do so without generating too much noise.
>
> Julian
>
> [1] https://calcite.apache.org/docs/powered_by.html
>
> [2] https://issues.apache.org/jira/browse/CALCITE-707
>
> [3] https://issues.apache.org/jira/browse/CALCITE-1991
>
> [4] https://cacm.acm.org/magazines/2017/10/221322-
> multi-objective-parametric-query-optimization/abstract
>
> [5] https://issues.apache.org/jira/browse/CALCITE-481
>
> On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <el...@apache.org> wrote:
> > On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
> >>
> >> I am not involved in the Avatica effort, but it has been great to see
> >> Avatica continue maturing, moving into its own repository and following
> with
> >> its own release cadence. Josh, Julian, if you want to add a few lines
> about
> >> the state of Avatica, that would be great.
> >
> >
> > Would be happy to :)
> >
> > I've certainly been spending less time on core-functionality. Avatica has
> > definitely passed the cusp for what most developers need. The majority of
> > users would find Avatica to be fully-featured as a JDBC interface (but
> there
> > are some gaps that still exist).
> >
> > We've started to see the focus on non-JDBC drivers for Avatica which is a
> > great sign. Our Francis has been making progress on trying to adopt the
> > driver written in Go into the Apache codebase. There are a few other
> drivers
> > available as well. The presence of these drivers, and their ability to
> > continue to function is good validation of the protocol/stability model
> that
> > we outlined/implemented in the past 1-2 years.
> >
> > Avatica is still fairly low-volume, with only a few people contributing.
> I'd
> > love to see more people take an interest (it's a great stepping stone
> into
> > Calcite too ;P).
>

Re: [DISCUSS] The state of the project - 2017

Posted by Julian Hyde <jh...@apache.org>.

Thanks for starting this discussion, Jesus. Here are some thoughts, in
no particular order.

I too have noticed the increase in academic adoption. This is
excellent. Shall we add a section to the "Powered by" page [1] on
academic projects and papers?

I worry a lot about audience (or audiences). Who is using Calcite? Are
we giving them what they need? Data engines (such as Drill, Hive and
Flink) are one category, and I think they are fairly well served.
Academics are another audience; some are succeeding, but I wonder
whether it would be easier for them if we had some relevant examples,
such as how to parse a query and optimize it using several different
cost models and combinations of rules. What other audiences are there?

There is an audience who would like to use Calcite as a standalone
engine; and folks who would like to incorporate materialized views,
indexes and constraints into their engine but prefer to speak SQL
rather than Java APIs. Those groups are not well served today. I am
working on a server which has DDL support[2][3]; it would provide a
(simple) standalone engine, but also allow us to demo materialized
views, virtual columns, check constraints and foreign tables/schemas
via SQL so that people building engines can more easily grasp the
concepts.

I read Trumer & Koch's paper "Multi-objective parametric query
optimization" [4] in CACM recently. It is a very exciting advance, and
too much to cover in this thread, but it got me thinking about how
Calcite could evolve to incorporate their ideas. I realize that giving
RelOptCost multiple fields was a mistake, unless we also add the
mechanics (piecewise-linear cost functions and polytopes) to handle
them. The vast majority of Calcite remains applicable, so this would
be evolutionary: Calcite's rules and algebra emerge intact in the new
order, and Calcite's metadata framework can model the new cost
functions. Extending Calcite could raise some interesting research
topics; is it possible to extend the parameter space (either the
number of parameters or the value range of those parameters) after
initiial planning?; can we use parameters to model whether
intermediate results are materialized (see [5]) or whether ephemeral
materialized views happen to be present in cache?; what new statistics
do we need to gather to power the new cost functions? There is enough
here to interest several researchers.

As for features:
* I would like to get to full compliance with OpenGIS, because spatial
support is much more straightforward in Calcite's algebraic approach
than in engines which need to build a new data structure.
* I also would like to give users a choice of engines in Calcite:
Spark and perhaps something based on Arrow, in addition to the
existing Enumerable engine.
* I would like to continue to make the planner more modular, so that
people can supply a program (a collection of rules organized into
planning phases) and basically just say "go".
* And I plan to continue my work to make data systems learn and adapt,
creating and populating materialized views based on observed query
patterns and data statistics.

Regarding governance. I think we are functioning well as a
meritocratic community. High-quality contributions arrive from people
who have never contributed before; this is happening more and more
frequently, which is really excellent. On the other hand, this
increases the load for reviewing (and pro-actively fixing)
contributions, and too much of that work still falls on my shoulders.
There are times when I get close to burn out, especially when people
explicitly direct questions and pull requests at me.

I think Michael would be an excellent PMC chair. I am delighted that
he is prepared to do the job.

Regarding CI. There is a bit more CI going on than meets the eye; I
run several tests nightly on my home server, and also on a Windows VM,
and speak up if things get broken. But I admit there has been bit-rot
in some of the adapters, and having a public CI for those adapters
would be useful, if we can do so without generating too much noise.

Julian

[1] https://calcite.apache.org/docs/powered_by.html

[2] https://issues.apache.org/jira/browse/CALCITE-707

[3] https://issues.apache.org/jira/browse/CALCITE-1991

[4] https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/abstract

[5] https://issues.apache.org/jira/browse/CALCITE-481

On Tue, Nov 7, 2017 at 9:19 AM, Josh Elser <el...@apache.org> wrote:
> On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
>>
>> I am not involved in the Avatica effort, but it has been great to see
>> Avatica continue maturing, moving into its own repository and following with
>> its own release cadence. Josh, Julian, if you want to add a few lines about
>> the state of Avatica, that would be great.
>
>
> Would be happy to :)
>
> I've certainly been spending less time on core-functionality. Avatica has
> definitely passed the cusp for what most developers need. The majority of
> users would find Avatica to be fully-featured as a JDBC interface (but there
> are some gaps that still exist).
>
> We've started to see the focus on non-JDBC drivers for Avatica which is a
> great sign. Our Francis has been making progress on trying to adopt the
> driver written in Go into the Apache codebase. There are a few other drivers
> available as well. The presence of these drivers, and their ability to
> continue to function is good validation of the protocol/stability model that
> we outlined/implemented in the past 1-2 years.
>
> Avatica is still fairly low-volume, with only a few people contributing. I'd
> love to see more people take an interest (it's a great stepping stone into
> Calcite too ;P).

Re: [DISCUSS] The state of the project - 2017

Posted by Josh Elser <el...@apache.org>.

On 11/6/17 12:00 PM, Jesus Camacho Rodriguez wrote:
> I am not involved in the Avatica effort, but it has been great to see Avatica continue maturing, moving into its own repository and following with its own release cadence. Josh, Julian, if you want to add a few lines about the state of Avatica, that would be great.

Would be happy to :)

I've certainly been spending less time on core-functionality. Avatica 
has definitely passed the cusp for what most developers need. The 
majority of users would find Avatica to be fully-featured as a JDBC 
interface (but there are some gaps that still exist).

We've started to see the focus on non-JDBC drivers for Avatica which is 
a great sign. Our Francis has been making progress on trying to adopt 
the driver written in Go into the Apache codebase. There are a few other 
drivers available as well. The presence of these drivers, and their 
ability to continue to function is good validation of the 
protocol/stability model that we outlined/implemented in the past 1-2 years.

Avatica is still fairly low-volume, with only a few people contributing. 
I'd love to see more people take an interest (it's a great stepping 
stone into Calcite too ;P).