You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Stamatis Zampetakis <za...@gmail.com> on 2022/07/05 21:43:59 UTC

Re: PR Review Request

I completely agree with Julian. The problem cannot be solved unless we
start investing more time in the project in the ways he already described.

What I outlined previously is an attempt to mitigate the current situation,
not something that can solve the problem for good. Nevertheless, to push
this forward I created a PR [1] with an initial sketch of the process. Feel
free to leave your comments there.

Best,
Stamatis

[1] https://github.com/apache/calcite/pull/2851

On Thu, Jun 23, 2022 at 8:34 PM Julian Hyde <jh...@gmail.com> wrote:

> +1 to Stamatis’ idea. It won’t make things worse. :)
>
> But to repeat what I said earlier. We need existing committers to pull
> their weight. If necessary, committers need to talk to their managers and
> get time allocated to contribute to “housekeeping”.
>
> One important kind of housekeeping is productization. That means not just
> getting features and bug fixes into Calcite, but adding sufficient
> documentation that users know they exist and how to use them. You may have
> noticed that I spend a lot of effort asking people to improve the subject
> and description of JIRA cases, and making sure that the commit message
> matches the JIRA subject. I do this because usually the only documentation
> of a feature is the line in the release notes and the JIRA case it links to.
>
> This effort is key to Calcite’s success, and quite a few committers don’t
> do it. If committers did a better job in this area, it would reduce the
> workload on me.
>
> Julian
>
>
>
> > On Jun 23, 2022, at 6:44 AM, Ruben Q L <ru...@gmail.com> wrote:
> >
> > +1 on Stamatis' idea, I think it could help with the current situation of
> > lack of reviewers.
> >
> > Best,
> > Ruben
> >
> >
> > On Thu, Jun 23, 2022 at 12:56 PM Charles Givre <cg...@gmail.com> wrote:
> >
> >> Hello all,
> >> FWIW, If a committer/reviewer shortage is the issue, I'd second
> Stamatis's
> >> recommendation.
> >> Best,
> >> -- C
> >>
> >>> On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis <za...@gmail.com>
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> How about granting Calcite committership to people who are already ASF
> >>> committers (in other projects) and they have a proven record of working
> >>> with Calcite?
> >>>
> >>> Usually the PMC invites people to become committers to the project
> after
> >>> having a few successful code contributions in Calcite/Avatica repos.
> >>> This is to ensure that people are familiar with the codebase and
> >> understand
> >>> how the ASF works.
> >>>
> >>> People who are already committers in an ASF project already know how
> the
> >>> foundation works and how they should behave.
> >>> Also people working in projects like Drill, Flink, Hive, Ignite,
> Phoenix,
> >>> etc., may already be quite familiar with Calcite if they have worked on
> >> the
> >>> query processing layer of the system.
> >>>
> >>> It might be difficult for the Calcite PMC to identify people familiar
> >> with
> >>> Calcite if they don't contribute to the main Calcite/Avatica repos
> >>> regularly thus I would be open to consider people for committers on a
> per
> >>> request basis.
> >>>
> >>> Example:
> >>> Bob is an ASF committer in Flink and he has pushed various
> contributions
> >>> around Calcite in the Flink repo.
> >>> Bob feels confident about fixing trivial things in Calcite and he wants
> >> to
> >>> help with reviewing and merging open PRs.
> >>> Bob sends an email to private@calcite list requesting to become a
> >> Calcite
> >>> committer.
> >>> Bob explains in the email who he is and what he has done to demonstrate
> >> he
> >>> is familiar with the Calcite code.
> >>> The Calcite PMC acknowledges the request and starts a vote for granting
> >>> Calcite comittership to Bob.
> >>> The Calcite PMC informs Bob about their decision and takes further
> >> actions
> >>> if necessary.
> >>>
> >>> If we agree on the overall idea we can figure out the details and
> >> formalize
> >>> the request process in our docs.
> >>>
> >>> Best,
> >>> Stamatis
> >>>
> >>> On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang <be...@gmail.com>
> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> This is an awesome discussion to improve collaborating between
> different
> >>>> projects.
> >>>> Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it
> >> happen.
> >>>>
> >>>> Best,
> >>>> Jing Zhang
> >>>>
> >>>> Martijn Visser <ma...@apache.org> 于2022年6月23日周四 01:43写道:
> >>>>
> >>>>> Hi Jacques, Julian, Austin and everyone else,
> >>>>>
> >>>>> Thank you very much for sharing all your experiences and providing
> >> really
> >>>>> valuable input. I'll definitely relay this back to the original
> >>>> discussion
> >>>>> thread in the Flink community. Part of bringing this information back
> >> to
> >>>>> the Flink community is also because I feel like the only way that
> >>>> different
> >>>>> OSS solutions can help each other forward is by communicating and
> >>>>> collaborating. As Timo already mentioned, he'll try to help out.
> Let's
> >>>> try
> >>>>> to get some more involved.
> >>>>>
> >>>>> Side note: I also saw that this thread got some traction on Twitter
> [1]
> >>>> on
> >>>>> the cost of forking.
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Martijn
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> >>>>
> >>
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA
> >>>>>
> >>>>> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther <twalthr@apache.org
> >:
> >>>>>
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> This is a really great discussion. Thanks for starting it Martijn
> and
> >>>>>> your input Jacques! I have been fighting against forking Calcite in
> >>>>>> Flink for years already. Even when merging forks of Flink that
> >>>>>> transitively forked Calcite, in the end we were able to resolve
> >>>>>> conflicts / contribute blockers back into Calcite. And I strongly
> >>>>>> believe that this is the better approach for long-term success for
> >> both
> >>>>>> projects.
> >>>>>>
> >>>>>> I would like to get more involved in the Calcite community. I have
> >> been
> >>>>>> implementing and managing Flink SQL based on Calcite since 2016.
> Thus,
> >>>> I
> >>>>>> feel confident to say that I know the code base and some quirks in
> the
> >>>>>> stack very well.
> >>>>>>
> >>>>>> Capacity-wise I will try to reserve some time for helping the
> Calcite
> >>>>>> community. Happy to get some pointers where and how I can help.
> >>>>>>
> >>>>>> I will take a look at https://github.com/apache/calcite/pull/2606
> >> this
> >>>>>> week to get the ball rolling. As this is an important addition and
> >>>>>> prepares for "customer SQL operators" in Flink SQL.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>> On 21.06.22 22:18, Charles Givre wrote:
> >>>>>>> As the PMC for Apache Drill, I'd echo everyone's comments here....
> >>>>> Don't
> >>>>>> fork.   Don't do it.
> >>>>>>>
> >>>>>>> Apache Drill forked Calcite several years ago which Calcite was on
> >>>>>> version 1.20 or 1.21.  While this meant that some bugs were easily
> >>>> fixed,
> >>>>>> what it also meant that as our fork diverged from "regular" Calcite,
> >> it
> >>>>>> became harder and harder to maintain.  It also meant that we were
> >>>> chasing
> >>>>>> bugs that had since been fixed.
> >>>>>>>
> >>>>>>> Drill is in the process of "de-forking" Calcite, meaning that we're
> >>>>>> ditching our fork and re-integrating with standard Calcite.  It has
> >>>> been
> >>>>> A
> >>>>>> TON of work and we have contributed (and will continue to
> contribute)
> >>>> bug
> >>>>>> fixes and PRs to Calcite. In the long run, I think this will be
> >>>>> beneficial
> >>>>>> for both communities.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> -- C
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jun 21, 2022, at 1:57 PM, Julian Hyde <jh...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Please don’t fork Calcite.
> >>>>>>>>
> >>>>>>>> Calcite suffers from the tragedy of the commons. Unlike many open
> >>>>>> source data projects, there is no commercial project that directly
> >> maps
> >>>>> to
> >>>>>> Calcite (even though Calcite is an essential part of many projects).
> >>>> As a
> >>>>>> result no engineers work full-time on Calcite.
> >>>>>>>>
> >>>>>>>> It takes more than pull requests to keep a project going. We need
> >>>>>> reviewers, people to work on releases, people to fix bugs (such as
> >>>>> security
> >>>>>> bugs) that are important to everyone but urgent to no one.
> >>>>>>>>
> >>>>>>>> We have plenty of committers in Calcite, and add several more per
> >>>>> year.
> >>>>>> We rely on those committers taking on their share of the housework,
> >> but
> >>>>> the
> >>>>>> burden falls on too few people.
> >>>>>>>>
> >>>>>>>> Engineering managers need to start paying a little more for the
> >>>> “free
> >>>>>> lunch” that they enjoy when Calcite “just works” in their project.
> >>>> Sadly,
> >>>>>> most engineering managers are not subscribed to this list.
> >>>>>>>>
> >>>>>>>> Julian
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <ja...@apache.org>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Martijn, thanks for sharing that thread in the Flink community.
> >>>>>>>>>
> >>>>>>>>> I'm someone who has forked Calcite twice: once in Apache Drill
> and
> >>>>>> again in
> >>>>>>>>> Dremio. In both cases, it was all about trading short term
> benefits
> >>>>>> against
> >>>>>>>>> long term costs. In both cases, I think the net amount of work
> was
> >>>>>> probably
> >>>>>>>>> 5x as much as what it would have been if we had just done a
> better
> >>>>> job
> >>>>>>>>> engaging the community. If I were to state the curve of behavior
> >>>> over
> >>>>>> six
> >>>>>>>>> years, I'd guess that in both cases the numbers of effort looked
> >>>> like
> >>>>>> this:
> >>>>>>>>>
> >>>>>>>>> estimated effort doing high intensity integration with calcite
> >>>> (years
> >>>>>> 1-6)
> >>>>>>>>> fork: 1, 5, 10, 50, 100, 200, total = 366
> >>>>>>>>> non-fork: 10, 10, 10, 10, 10, total = 50
> >>>>>>>>>
> >>>>>>>>> So yes, the first couple years you're ahead. But you pay a
> massive
> >>>>>>>>> technical debt premium long term. Early in a project (Drill) or
> >>>>>> company's
> >>>>>>>>> life (Dremio), it can make sense to sacrifice long term for short
> >>>>> term
> >>>>>> but
> >>>>>>>>> it's important people do it with their eyes open.
> >>>>>>>>>
> >>>>>>>>> The reason that this pain is so high is that as your codebases
> >>>>>> diverge, you
> >>>>>>>>> start having to do everything the Calcite community does by
> >>>> yourself.
> >>>>>>>>> Backports become harder and things that you need (e.g. new sql
> >>>>> syntax,
> >>>>>> etc)
> >>>>>>>>> have to be reimplemented (even if someone else already
> implemented
> >>>>>> them in
> >>>>>>>>> some post-fork Calcite version. Ultimately, at some point you
> >>>> realize
> >>>>>> that
> >>>>>>>>> your path is untenable and you unfork. This becomes the biggest
> >>>>>> expense of
> >>>>>>>>> them all and I believe both of those teams are still trying to
> >>>>>> un-fork. The
> >>>>>>>>> additional thing that becomes an even bigger problem is your
> >>>> absence
> >>>>>> from
> >>>>>>>>> the Calcite community means that people may take the project or
> >>>> APIs
> >>>>> in
> >>>>>>>>> ways that are in direct conflict to how you use the library.
> Since
> >>>>>> you're
> >>>>>>>>> not active in the project, you fail to provide a counterpoint and
> >>>>> then
> >>>>>>>>> you're basically just in a miserable place. The Hive project did
> >>>> this
> >>>>>> best
> >>>>>>>>> by ensuring that releases of Calcite were also run pre-release
> >>>>> against
> >>>>>> Hive
> >>>>>>>>> to make sure no major regressions occurred. By being in the
> >>>> community
> >>>>>> and
> >>>>>>>>> active, this is the best state from my pov. (It makes your
> project
> >>>>>> better
> >>>>>>>>> and Calcite better.)
> >>>>>>>>>
> >>>>>>>>> Two last notes:
> >>>>>>>>> - I'm not sure the rocks fork is comparable to forking Calcite.
> The
> >>>>> api
> >>>>>>>>> surface area and community models are very different.
> >>>>>>>>> - This is all based on a high intensity integration (using rules
> +
> >>>>>> planner
> >>>>>>>>> or sql + rules + planner). Calcite is frustratingly monolithic
> and
> >>>> if
> >>>>>>>>> someone was only going to use a small component, my opinion would
> >>>>>> likely be
> >>>>>>>>> very different.
> >>>>>>>>>
> >>>>>>>>> I'd send this to the Flink list but I'm not subscribed. It'd be
> >>>> great
> >>>>>> if
> >>>>>>>>> you shared it with the people over there if you think they'd find
> >>>> it
> >>>>>> useful.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser <
> >>>>>> martijnvisser@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Julian and Austin!
> >>>>>>>>>>
> >>>>>>>>>> Any reply to kick-off some sort of discussion is worthwhile :D
> >>>>>>>>>> I definitely know the feeling of having more PRs open then you
> >>>> would
> >>>>>> like,
> >>>>>>>>>> looking at https://github.com/apache/flink/pulls :)
> >>>>>>>>>>
> >>>>>>>>>> There have been discussions in the Flink community about forking
> >>>>>> Calcite
> >>>>>>>>>> [1]. My personal preference at the moment is to see if we can
> >>>>> create a
> >>>>>>>>>> better collaboration and community. I believe that we can find
> >>>>> people
> >>>>>> from
> >>>>>>>>>> the Flink community who can open / help reviewing Calcite PRs
> that
> >>>>> are
> >>>>>>>>>> interesting for the Flink community. The question is if that
> will
> >>>>>> also help
> >>>>>>>>>> short term since in the end it still requires a Calcite
> maintainer
> >>>>> to
> >>>>>>>>>> review/merge.
> >>>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>>
> >>>>>>>>>> Martijn
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>> https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett <
> >>>>>>>>>> whatwouldaustindo@gmail.com>:
> >>>>>>>>>>
> >>>>>>>>>>> From the peanut gallery :-)  -->
> >>>>>>>>>>>
> >>>>>>>>>>> Wow; yes, lots of open PRs.
> >>>>> https://github.com/apache/calcite/pulls
> >>>>>>>>>>>
> >>>>>>>>>>> How can individuals from the Flink [sub-]community, and/or more
> >>>>>> general
> >>>>>>>>>>> calcite community help lighten this load?  Is there much weight
> >>>>>> given to
> >>>>>>>>>>> reviews from non-committers; how to increase the # of people
> >>>>> capable
> >>>>>> of
> >>>>>>>>>>> providing worthwhile reviews [ that are recognized as such ]?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde <
> >>>>> jhyde.apache@gmail.com
> >>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Martijn,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Since you requested a reply, I am replying. To answer your
> >>>>>> question, I
> >>>>>>>>>>>> don’t know of a way to move this topic forward. We have more
> PRs
> >>>>>> than
> >>>>>>>>>>>> people to review them.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser <
> >>>>>>>>>> martijnvisser@apache.org
> >>>>>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I just wanted to reach out to the Calcite community once more
> >>>> on
> >>>>>> this
> >>>>>>>>>>>> topic
> >>>>>>>>>>>>> since no reply was received. Would be great if someone could
> >>>> get
> >>>>>> back
> >>>>>>>>>>> to
> >>>>>>>>>>>> us.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser <
> >>>>>>>>>>>> martijnvisser@apache.org
> >>>>>>>>>>>>>> :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I would like to follow-up on this email that was sent by
> Jing.
> >>>>> So
> >>>>>>>>>> far,
> >>>>>>>>>>>> no
> >>>>>>>>>>>>>> progress has been made, despite reaching out to the mailing
> >>>>> list,
> >>>>>>>>>> the
> >>>>>>>>>>>>>> original Jira ticket and reaching out to people directly. Is
> >>>>>> there a
> >>>>>>>>>>> way
> >>>>>>>>>>>>>> that we can move this PR/topic forward?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For context, in Apache Flink we're currently heavily using
> >>>>>> Calcite.
> >>>>>>>>>>>>>> However, we are now at the stage where Calcite is actually
> >>>>> holding
> >>>>>>>>>> us
> >>>>>>>>>>>> back.
> >>>>>>>>>>>>>> It would be great if we can find a way to strengthen our
> bond
> >>>>> and
> >>>>>>>>>> move
> >>>>>>>>>>>> both
> >>>>>>>>>>>>>> Calcite and Flink forward.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Looking forward to your thoughts,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote:
> >>>>>>>>>>>>>>> Hi community,
> >>>>>>>>>>>>>>> My apologies for interrupting.
> >>>>>>>>>>>>>>> Anyone could help to review the pr
> >>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/2606?
> >>>>>>>>>>>>>>> Thanks a lot.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This
> Jira
> >>>>>> aims
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> extend existing Table function in order to support
> >>>> Polymorphic
> >>>>>>>>>> Table
> >>>>>>>>>>>>>>> Function which is introduced as the part of ANSI SQL 2016.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The brief change logs of the PR are:
> >>>>>>>>>>>>>>> - Update `Parser.jj` to support partition by clause and
> order
> >>>>> by
> >>>>>>>>>>>> clause
> >>>>>>>>>>>>>>> for input table with set semantics of PTF
> >>>>>>>>>>>>>>> - Introduce `TableCharacteristics` which contains three
> >>>>>>>>>>>> characteristics
> >>>>>>>>>>>>>>> of input table of table function
> >>>>>>>>>>>>>>> - Update `SqlTableFunction` to add a method
> >>>>>>>>>> `tableCharacteristics`,
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> method returns the table characteristics for the ordinal-th
> >>>>>>>>>> argument
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>> this table function. Default return value is Optional.empty
> >>>>> which
> >>>>>>>>>>> means
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> ordinal-th argument is not table.
> >>>>>>>>>>>>>>> - Introduce `SqlSetSemanticsTable` which represents input
> >>>> table
> >>>>>>>>>> with
> >>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>> semantics of Table Function, its `SqlKind` is
> >>>>>> `SET_SEMANTICS_TABLE`
> >>>>>>>>>>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic
> >>>>> table
> >>>>>>>>>> of
> >>>>>>>>>>>>>> Table
> >>>>>>>>>>>>>>> Function could have partition by and order by clause
> >>>>>>>>>>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse
> >>>>> subQuery
> >>>>>>>>>>>> which
> >>>>>>>>>>>>>>> represents set semantics table.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> PR: https://github.com/apache/calcite/pull/2606
> >>>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
> >>>>>>>>>>>>>>> Parent JARA:
> >>>>> https://issues.apache.org/jira/browse/CALCITE-4864
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Jing Zhang
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>