You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Botong Huang <pk...@gmail.com> on 2021/04/04 00:15:52 UTC

Re: Proposal to extend Calcite into a incremental query optimizer

Hi all,

Apology for the delay. It took us some time to clean up our code base and
publicly release it (which will be out soon) for a quick peek.

We are ready to present our work. Let's schedule a time for a Zoom
meeting and discuss how to integrate Tempura into Calcite.

Since some of our team members are in China, we prefer the time slot of
7:00pm-11:30pm PST any day. I've added our time preference in the shared
doc below.
https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing

We encourage everyone to add their time preferences (during 04/15-04/30) in
this doc. In a week or so, we will try to settle a time that works for
most.

Thanks,
Botong

On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com> wrote:

> Hi Julian and Rui,
>
> Sounds good to us. Please give us some time to prepare some slides for the
> meeting.
>
> I've created a doc below for discussion. Please feel free to add more in
> here:
>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>
> Thanks,
> Botong
>
> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <jh...@gmail.com>
> wrote:
>
>> PS The “editable doc” that Rui refers to is also a good idea. I think we
>> should create it to continue discussion after the first meeting.
>>
>> Julian
>>
>> > On Jan 28, 2021, at 11:16 AM, Julian Hyde <jh...@gmail.com>
>> wrote:
>> >
>> > I think good next steps would be a PR and a meeting. The PR will allow
>> us to read the code, but I think we should do the first round of questions
>> at the meeting.  The meeting could perhaps start with a presentation of the
>> paper (do you have some slides you are planning to present at VLDB,
>> Botong?) and then move on to questions about the concepts, which
>> alternatives were considered, and how the concepts map onto other current
>> and future concepts in calcite.
>> >
>> > I don’t think we should start “reviewing” the PR line-by-line at this
>> point. We need to understand the high-level concepts and design choices. If
>> we start reviewing the PR we will get lost in the details.
>> >
>> > I know that integrating a major change is hard; I doubt that we will be
>> able to integrate everything, but we can build understanding about where
>> calcite needs to go, and I hope integrate a good amount of code to help us
>> get there.
>> >
>> > As I said before, after the integration I would like people to be able
>> to experiment with it and use it in their production systems.  That way, it
>> will not be an experiment that withers, but a feature set integrates with
>> other calcite features and gets stronger over time.
>> >
>> > Julian
>> >
>> >> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org> wrote:
>> >>
>> >> For me to participate in the discussion for the above questions, I
>> will
>> >> need to read a lot more to know relevant context and likely ask lots of
>> >> questions :-).  A editable doc is probably good for questions and back
>> and
>> >> forward discussion.
>> >>
>> >>
>> >> -Rui
>> >>
>> >>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <am...@apache.org>
>> wrote:
>> >>>
>> >>> I am also happy to help push this work into Calcite (review code and
>> doc,
>> >>> etc.).
>> >>>
>> >>> While you can share your code so people can have more idea how it is
>> >>> implemented, I think it would be also nice to have a doc to discuss
>> open
>> >>> questions above. Some points that I copy those to here:
>> >>>
>> >>> 1. Can this solution be compatible with existing solutions in Calcite
>> >>> Streaming, materialized view maintenance, and multi-query optimization
>> >>> (Sigma and Delta relational operators, lattice, and Spool operator),
>> >>> 2. Did you find that you needed two separate cost models - one for
>> “view
>> >>> maintenance” and another for “user queries” - since the objectives of
>> each
>> >>> activity are so different?
>> >>> 3. whether this work will hasten the arrival of multi-objective
>> parametric
>> >>> query optimization [1] in Calcite.
>> >>> 4. probably SQL shell support.
>> >>>
>> >>>
>> >>> [1]:
>> >>>
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> >>>
>> >>>
>> >>> -Rui
>> >>>
>> >>>
>> >>>
>> >>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com> wrote:
>> >>>>
>> >>>> it would be very nice to see a POC of your work.
>> >>>>
>> >>>>
>> >>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <pk...@gmail.com>
>> wrote:
>> >>>>
>> >>>>> Hi Julian,
>> >>>>>
>> >>>>> Just wondering if there are any updates? We are wondering if it
>> would
>> >>>> help
>> >>>>> to post our code for a quick preview.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Botong
>> >>>>>
>> >>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <pk...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>>> Hi Julian,
>> >>>>>>
>> >>>>>> Thanks for your interest! Sure let's figure out a plan that best
>> >>>> benefits
>> >>>>>> the community. Here are some clarifications that hopefully answer
>> your
>> >>>>>> questions.
>> >>>>>>
>> >>>>>> In our work (Tempura), users specify the set of time points to
>> >>>> consider
>> >>>>>> running and a cost function that expresses users' preference over
>> >>>> time,
>> >>>>>> Tempura will generate the best incremental plan that minimizes the
>> >>>>> overall
>> >>>>>> cost function.
>> >>>>>>
>> >>>>>> In this incremental plan, the sub-plans at different time points
>> can
>> >>>> be
>> >>>>>> different from each other, as opposed to identical plans in all
>> delta
>> >>>>> runs
>> >>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura paper,
>> we
>> >>>> can
>> >>>>>> mimic the current streaming implementation by specifying two
>> (logical)
>> >>>>> time
>> >>>>>> points in Tempura, representing the initial run and later delta
>> runs
>> >>>>>> respectively. In general, note that Tempura supports various form
>> of
>> >>>>>> incremental computing, not only the small-delta append-only data
>> >>>> model in
>> >>>>>> streaming systems. That's why we believe Tempura subsumes the
>> current
>> >>>>>> streaming support, as well as any IVM implementations.
>> >>>>>>
>> >>>>>> About the cost model, we did not come up with a seperate cost
>> model,
>> >>>> but
>> >>>>>> rather extended the existing one. Similar to multi-objective
>> >>>>> optimization,
>> >>>>>> costs incurred at different time points are considered different
>> >>>>>> dimensions. Tempura lets users supply a function that converts this
>> >>>> cost
>> >>>>>> vector into a final cost. So under this function, any two
>> incremental
>> >>>>> plans
>> >>>>>> are still comparable and there is an overall optimum. I guess we
>> can
>> >>>> go
>> >>>>>> down the route of multi-objective parametric query optimization
>> >>>> instead
>> >>>>> if
>> >>>>>> there is a need.
>> >>>>>>
>> >>>>>> Next on materialized views and multi-query optimization, since our
>> >>>>>> multi-time-point plan naturally involves materializing intermediate
>> >>>>> results
>> >>>>>> for later time points, we need to solve the problem of choosing
>> >>>>>> materializations and include the cost of saving and reusing the
>> >>>>>> materializations when costing and comparing plans. We borrowed the
>> >>>>>> multi-query optimization techniques to solve this problem even
>> though
>> >>>> we
>> >>>>>> are looking at a single query. As a result, we think our work is
>> >>>>> orthogonal
>> >>>>>> to Calcite's facilities around utilizing existing views, lattice
>> etc.
>> >>>> We
>> >>>>> do
>> >>>>>> feel that the multi-query optimization component can be adopted to
>> >>>> wider
>> >>>>>> use, but probably need more suggestions from the community.
>> >>>>>>
>> >>>>>> Lastly, our current implementation is set up in java code, it
>> should
>> >>>> be
>> >>>>>> straightforward to hook it up with SQL shell.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Botong
>> >>>>>>
>> >>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>> jhyde.apache@gmail.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Botong,
>> >>>>>>>
>> >>>>>>> This is very exciting; congratulations on this research, and thank
>> >>>> you
>> >>>>>>> for contributing it back to Calcite.
>> >>>>>>>
>> >>>>>>> The research touches several areas in Calcite: streaming,
>> >>>> materialized
>> >>>>>>> view maintenance, and multi-query optimization. As we have already
>> >>>> some
>> >>>>>>> solutions in those areas (Sigma and Delta relational operators,
>> >>>> lattice,
>> >>>>>>> and Spool operator), it will be interesting to see whether we can
>> >>>> make
>> >>>>> them
>> >>>>>>> compatible, or whether one concept can subsume others.
>> >>>>>>>
>> >>>>>>> Your work differs from streaming queries in that your relations
>> are
>> >>>> used
>> >>>>>>> by “external” user queries, whereas in pure streaming queries, the
>> >>>> only
>> >>>>>>> activity is the change propagation. Did you find that you needed
>> two
>> >>>>>>> separate cost models - one for “view maintenance” and another for
>> >>>> “user
>> >>>>>>> queries” - since the objectives of each activity are so different?
>> >>>>>>>
>> >>>>>>> I wonder whether this work will hasten the arrival of
>> multi-objective
>> >>>>>>> parametric query optimization [1] in Calcite.
>> >>>>>>>
>> >>>>>>> I will make time over the next few days to read and digest your
>> >>>> paper.
>> >>>>>>> Then I expect that we will have a back-and-forth process to create
>> >>>>>>> something that will be useful for the broader community.
>> >>>>>>>
>> >>>>>>> One thing will be particularly useful: making this functionality
>> >>>>>>> available from a SQL shell, so that people can experiment with
>> this
>> >>>>>>> functionality without writing Java code or setting up complex
>> >>>> databases
>> >>>>> and
>> >>>>>>> metadata. I have in mind something like the simple DDL operations
>> >>>> that
>> >>>>> are
>> >>>>>>> available in Calcite’s ’server’ module. I wonder whether we could
>> >>>> devise
>> >>>>>>> some kind of SQL syntax for a “multi-query”.
>> >>>>>>>
>> >>>>>>> Julian
>> >>>>>>>
>> >>>>>>> [1]
>> >>>>>>>
>> >>>>>
>> >>>>
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <pk...@gmail.com>
>> >>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Thanks Aron for pointing this out. To see the figure, please
>> refer
>> >>>> to
>> >>>>>>> Fig
>> >>>>>>>> 3(a) in our paper:
>> >>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Botong
>> >>>>>>>>
>> >>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <ta...@gmail.com>
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Seems interesting, the pic can not be seen in the mail, may you
>> >>>> open
>> >>>>> a
>> >>>>>>> JIRA
>> >>>>>>>>> for this, people who are interested in this can subscribe to the
>> >>>>> JIRA?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Regards!
>> >>>>>>>>>
>> >>>>>>>>> Aron Tao
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四 上午3:18写道:
>> >>>>>>>>>
>> >>>>>>>>>> Hi all,
>> >>>>>>>>>>
>> >>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
>> general
>> >>>>>>>>>> incremental query optimizer, based on our research paper
>> >>>> published
>> >>>>> in
>> >>>>>>>>> VLDB
>> >>>>>>>>>> 2021:
>> >>>>>>>>>> Tempura: a general cost-based optimizer framework for
>> incremental
>> >>>>> data
>> >>>>>>>>>> processing
>> >>>>>>>>>>
>> >>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how Alibaba’s
>> >>>> data
>> >>>>>>>>>> warehouse is planning to use this incremental query optimizer
>> to
>> >>>>>>>>> alleviate
>> >>>>>>>>>> cluster-wise resource skewness:
>> >>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
>> Incremental
>> >>>>>>>>> Computing
>> >>>>>>>>>>
>> >>>>>>>>>> To our best knowledge, this is the first general cost-based
>> >>>>>>> incremental
>> >>>>>>>>>> optimizer that can find the best plan across multiple families
>> of
>> >>>>>>>>>> incremental computing methods, including IVM, Streaming,
>> >>>> DBToaster,
>> >>>>>>> etc.
>> >>>>>>>>>> Experiments (in the paper) shows that the generated best plan
>> is
>> >>>>>>>>>> consistently much better than the plans from each individual
>> >>>> method
>> >>>>>>>>> alone.
>> >>>>>>>>>>
>> >>>>>>>>>> In general, incremental query planning is central to database
>> >>>> view
>> >>>>>>>>>> maintenance and stream processing systems, and are being
>> adopted
>> >>>> in
>> >>>>>>>>> active
>> >>>>>>>>>> databases, resumable query execution, approximate query
>> >>>> processing,
>> >>>>>>> etc.
>> >>>>>>>>> We
>> >>>>>>>>>> are hoping that this feature can help widening the spectrum of
>> >>>>>>> Calcite,
>> >>>>>>>>>> solicit more use cases and adoption of Calcite.
>> >>>>>>>>>>
>> >>>>>>>>>> Below is a brief description of the technical details. Please
>> >>>> refer
>> >>>>> to
>> >>>>>>>>> the
>> >>>>>>>>>> Tempura paper for more details. We are also working on a
>> journal
>> >>>>>>> version
>> >>>>>>>>> of
>> >>>>>>>>>> the paper with more implementation details.
>> >>>>>>>>>>
>> >>>>>>>>>> Currently the query plan generated by Calcite is meant to be
>> >>>>> executed
>> >>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will be
>> >>>> extended
>> >>>>>>> with
>> >>>>>>>>>> temporal information so that it is capable of generating
>> >>>> incremental
>> >>>>>>>>> plans
>> >>>>>>>>>> that include multiple sub-plans to execute at different time
>> >>>> points.
>> >>>>>>>>>>
>> >>>>>>>>>> The main idea is to view each table as one that changes over
>> time
>> >>>>>>> (Time
>> >>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
>> >>>> TvrMetaSet
>> >>>>>>> into
>> >>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track related
>> >>>> RelSets
>> >>>>>>> of a
>> >>>>>>>>>> changing table (e.g. snapshot of the table at certain time,
>> >>>> delta of
>> >>>>>>> the
>> >>>>>>>>>> table between two time points, etc.).
>> >>>>>>>>>>
>> >>>>>>>>>> [image: image.png]
>> >>>>>>>>>>
>> >>>>>>>>>> For example in the above figure, each vertical line is a
>> >>>> TvrMetaSet
>> >>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
>> Horizontal
>> >>>>> lines
>> >>>>>>>>>> represent time. Each black dot in the grid is a RelSet. Users
>> can
>> >>>>>>> write
>> >>>>>>>>> TVR
>> >>>>>>>>>> Rewrite Rules to describe valid transformations between these
>> >>>> dots.
>> >>>>>>> For
>> >>>>>>>>>> example, the blues lines are inter-TVR rules that describe how
>> to
>> >>>>>>> compute
>> >>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The red
>> lines
>> >>>>> are
>> >>>>>>>>>> intra-TVR rules that describe transformations within a TVR. All
>> >>>> TVR
>> >>>>>>>>> rewrite
>> >>>>>>>>>> rules are logical rules. All existing Calcite rules still work
>> in
>> >>>>> the
>> >>>>>>> new
>> >>>>>>>>>> volcano system without modification.
>> >>>>>>>>>>
>> >>>>>>>>>> All changes in this feature will consist of four parts:
>> >>>>>>>>>> 1. Memo extension with TvrMetaSet
>> >>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
>> >>>> RelNodes,
>> >>>>>>> as
>> >>>>>>>>>> well as links in between the nodes.
>> >>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded rule
>> >>>> engine
>> >>>>>>> API.
>> >>>>>>>>>> 4. Multi-query optimization, used to find the best incremental
>> >>>> plan
>> >>>>>>>>>> involving multiple time points.
>> >>>>>>>>>>
>> >>>>>>>>>> Note that this feature is an extension in nature and thus when
>> >>>>>>> disabled,
>> >>>>>>>>>> does not change any existing Calcite behavior.
>> >>>>>>>>>>
>> >>>>>>>>>> Other than scenarios in the paper, we also applied this
>> >>>>>>> Calcite-extended
>> >>>>>>>>>> incremental query optimizer to a type of periodic query called
>> >>>> the
>> >>>>>>>>> ‘‘range
>> >>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost savings
>> of
>> >>>> 80%
>> >>>>>>> on
>> >>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
>> execution
>> >>>>>>> time.
>> >>>>>>>>>>
>> >>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
>> >>>> holidays!
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> Botong
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> ~~~~~~~~~~~~~~~
>> >>>> no mistakes
>> >>>> ~~~~~~~~~~~~~~~~~~
>> >>>>
>> >>>
>>
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi Haisheng,

Thanks for the reminder. Yeah I've been occupied with several other
deadlines. We will try to come up with something by next week.

Best,
Botong

On Wed, Jul 7, 2021 at 6:05 PM Haisheng Yuan <hy...@apache.org> wrote:

> Hi Botong,
>
> We haven't heard from you for a while.
> Feel free to reach out if you get stuck or need help on rebasing code.
>
> Thanks,
> Haisheng
>
> On 2021/05/15 00:54:02, Botong Huang <pk...@gmail.com> wrote:
> > Hi all,
> >
> > Thank you all for the interest, and thanks Julian for the update!
> >
> > I am having problems uploading the pdf files into the jira CALCITE-4568
> > <https://issues.apache.org/jira/browse/CALCITE-4568>, so I attached the
> > slides in our code base:
> >
> https://github.com/alibaba/cost-based-incremental-optimizer/blob/main/Tempura_Calcite_presentation.pdf
> >
> > The slides contain a walking example of how Tempura expands its memo. The
> > current version of the code also has two e2e unit tests at
> > TvrOptimizationTest.java and TvrExecutionTest.java. Please feel free to
> > start playing with them, and feel free to reach out and possibly schedule
> > another meeting if needed.
> >
> > As agreed in the meeting, we will rebase our code to a newer version of
> > Calcite.
> >
> > Best,
> > Botong
> >
> > On Thu, May 13, 2021 at 12:47 PM Julian Hyde <jh...@gmail.com>
> wrote:
> >
> > > During the meeting we agreed to start progressing this contribution in
> the
> > > usual Apache Way, with conversations on the dev list and in the
> > > https://issues.apache.org/jira/browse/CALCITE-4568 <
> > > https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
> > > should be easy for you to participate.
> > >
> > > Botong said he would share the slides. (He might be unwilling to make
> them
> > > public, because they are his presentation for a conference that has not
> > > happened yet. Reach out to him one-to-one.)
> > >
> > > Next step is for someone on the Alibaba side to create a PR that is
> > > rebased on the latest Calcite master, and add a comment to the JIRA
> case.
> > > Then we can discuss what needs to be done for that PR. Code quality,
> adding
> > > comments, breaking up into smaller commits, additional tests, renaming
> > > packages/classes, restructuring into plugins are all possibilities.
> > >
> > > Our side of the bargain, as committers, is that we should review in a
> > > timely manner, and not move the goal posts — if the contributors make
> the
> > > changes we request then we will land this code in master in a
> reasonable
> > > amount of time.
> > >
> > > We also discussed incremental view maintenance (IVM). Tempura solves a
> > > more general problem (finding the optimal K steps to maintain a
> > > materialized view as data arrives in K points in time) but if we set
> K=2,
> > > we can generate a plan for how to update a materialized view given a
> delta
> > > table. The plan will be different based on cost - e.g. whether the
> delta
> > > table is small or large. This is a problem that many of our users would
> > > like to solve. It will exercise much of Tempura’s code base, and
> encourage
> > > contributions.
> > >
> > > In my opinion, we should do IVM at launch. It should be the main
> example
> > > we use in conference talks, blog posts, etc. When people understand
> that
> > > case, we can explain how we generalize from K=2 to arbitrary K.
> > >
> > > Julian
> > >
> > >
> > > > On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
> > > >
> > > > I apologize that I had a wrong impression on the meeting time (I
> thought
> > > it
> > > > should be on Thursday but it is Wednesday). I can follow up your
> meeting
> > > > records if you have any.
> > > >
> > > >
> > > > -Rui
> > > >
> > > > On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com>
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> This is a reminder that we are going to have our second discussion
> > > meeting
> > > >> tomorrow at 10-11pm PST. Please find the link below, everyone is
> > > welcome to
> > > >> join!
> > > >>
> > > >> Join Zoom Meeting
> > > >> https://uci.zoom.us/j/91986206610
> > > >> <
> > > >>
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
> > > >>>
> > > >>
> > > >> Meeting ID: 919 8620 6610
> > > >> One tap mobile
> > > >> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
> > > >> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
> > > >>
> > > >> Dial by your location
> > > >>        +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> > > >>        +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> > > >>        +1 346 248 7799 <(346)%20248-7799> US (Houston)
> > > >>        +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> > > >>        +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> > > >>        +1 646 558 8656 <(646)%20558-8656> US (New York)
> > > >> Meeting ID: 919 8620 6610
> > > >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> > > >> <
> > > >>
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
> > > >>>
> > > >>
> > > >> Join by Skype for Business
> > > >> https://uci.zoom.us/skype/91986206610
> > > >> <
> > > >>
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> > > >>>
> > > >>
> > > >> Thanks,
> > > >> Botong
> > > >>
> > > >> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com>
> wrote:
> > > >>
> > > >>> Hi Stamatis and all,
> > > >>>
> > > >>> Thanks for the interest! Let's tentatively schedule the next
> meeting
> > > next
> > > >>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if
> there's
> > > >> new
> > > >>> needs showing up.
> > > >>>
> > > >>> Best,
> > > >>> Botong
> > > >>>
> > > >>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <
> zabetak@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Hello,
> > > >>>>
> > > >>>> I really regret missing the first meeting, sorry about that. I
> added
> > > my
> > > >>>> preferences in the document.
> > > >>>> I will make sure to attend the next one and help as much as I can.
> > > >>>>
> > > >>>> I didn't have the chance yet to go over the paper but will try to
> do
> > > it
> > > >>>> before the next meeting.
> > > >>>>
> > > >>>> For me the following dates are more convenient than others so it
> would
> > > >> be
> > > >>>> nice if we could arrange it then.
> > > >>>>
> > > >>>> Thu, May 6, 10pm PST
> > > >>>> Tue, May 12, 10pm PST
> > > >>>>
> > > >>>> Best,
> > > >>>> Stamatis
> > > >>>>
> > > >>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org>
> wrote:
> > > >>>>
> > > >>>>> I have added my time preferences to the doc [1]. I am generally
> > > >>>>> available any evening Mon - Thu. How about we meet Monday 10th
> May?
> > > >>>>>
> > > >>>>> Stamatis, Jesus, Given the complexity of this work, I would very
> much
> > > >>>>> appreciate your insight, as experts in optimizer theory. Could
> one of
> > > >>>>> you join the next meeting? Of course we should choose a time that
> > > >>>>> works for everyone's schedule.
> > > >>>>>
> > > >>>>> Julian
> > > >>>>>
> > > >>>>> [1]
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>>>
> > > >>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
> > > >> wrote:
> > > >>>>>>
> > > >>>>>> We didn't record it, we will try to record the following
> meetings.
> > > >>>> Please
> > > >>>>>> add your time preference in the docs, so that we can find a
> meeting
> > > >>>> time
> > > >>>>>> that works for more people.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Botong
> > > >>>>>>
> > > >>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
> > > >> viliam@hazelcast.com>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Is there a recording available?
> > > >>>>>>> Viliam
> > > >>>>>>>
> > > >>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
> > > >>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi all,
> > > >>>>>>>>
> > > >>>>>>>> The meeting yesterday was fun and productive. As discussed,
> this
> > > >>>> is
> > > >>>>> the
> > > >>>>>>>> call to schedule our second meeting.
> > > >>>>>>>>
> > > >>>>>>>> We encourage everyone to add their time preferences during
> > > >> 05/01 -
> > > >>>>> 05/15
> > > >>>>>>>> here:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> Botong
> > > >>>>>>>>
> > > >>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <
> pkuhbt@gmail.com>
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi all,
> > > >>>>>>>>> We've created a zoom meeting below for our meeting next
> Monday
> > > >>>>>>>>> (9pm-10:30pm PST on 04/26).
> > > >>>>>>>>> Talk to you all soon!
> > > >>>>>>>>>
> > > >>>>>>>>> Join Zoom Meeting
> > > >>>>>>>>> https://uci.zoom.us/j/91279732686
> > > >>>>>>>>> <
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Meeting ID: 912 7973 2686
> > > >>>>>>>>> One tap mobile
> > > >>>>>>>>> +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
> > > >>>>>>>>> +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
> > > >>>>>>>>>
> > > >>>>>>>>> Dial by your location
> > > >>>>>>>>> +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> > > >>>>>>>>> +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> > > >>>>>>>>> +1 346 248 7799 <(346)%20248-7799> US (Houston)
> > > >>>>>>>>> +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> > > >>>>>>>>> +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> > > >>>>>>>>> +1 646 558 8656 <(646)%20558-8656> US (New York)
> > > >>>>>>>>> Meeting ID: 912 7973 2686
> > > >>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > >>>>>>>>> <
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Join by Skype for Business
> > > >>>>>>>>> https://uci.zoom.us/skype/91279732686
> > > >>>>>>>>> <
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks,
> > > >>>>>>>>> Botong
> > > >>>>>>>>>
> > > >>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
> > > >> pkuhbt@gmail.com
> > > >>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi all,
> > > >>>>>>>>>>
> > > >>>>>>>>>> According to the preferences collected, we are tentatively
> > > >>>>> scheduling
> > > >>>>>>>> our
> > > >>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > > >>>>>>>>>>
> > > >>>>>>>>>> We will give a presentation about Tempura, followed by a
> free
> > > >>>>>>>> discussion.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Please let us know if there are new other requests. Few days
> > > >>>>> before
> > > >>>>>>>>>> the meeting, I will send out a zoom meeting link.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks,
> > > >>>>>>>>>> Botong
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
> > > >> pkuhbt@gmail.com>
> > > >>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Julian and all,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
> > > >> a
> > > >>>>> quick
> > > >>>>>>>> peek
> > > >>>>>>>>>>> at the last five commits.
> > > >>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I've also opened a Jira (CALCITE-4568
> > > >>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
> > > >> which
> > > >>>>> will
> > > >>>>>>>> serve
> > > >>>>>>>>>>> as the umbrella Jira for the feature.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> In the meantime, we encourage everyone to enter the time
> > > >>>>> preferences
> > > >>>>>>>> for
> > > >>>>>>>>>>> our first meeting here:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks,
> > > >>>>>>>>>>> Botong
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> > > >>>>> jhyde.apache@gmail.com>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> I have added my time preferences to the doc.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Initial discussions will need to be about architecture and
> > > >>>>>>> high-level
> > > >>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
> > > >> PR
> > > >>>>>>>> line-by-line
> > > >>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
> > > >>>>> design
> > > >>>>>>>>>>>> holistically, and prepare questions/comments before the
> > > >>>> meeting.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
> > > >>>> task?
> > > >>>>>>> JIRA
> > > >>>>>>>>>>>> how we track long-running tasks such as this.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Julian
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
> > > >> pkuhbt@gmail.com
> > > >>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
> > > >> our
> > > >>>>> code
> > > >>>>>>>> base
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
> > > >>>> peek.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
> > > >>>> for a
> > > >>>>> Zoom
> > > >>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
> > > >> Calcite.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Since some of our team members are in China, we prefer
> > > >> the
> > > >>>>> time
> > > >>>>>>> slot
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
> > > >> preference
> > > >>>> in
> > > >>>>> the
> > > >>>>>>>>>>>> shared
> > > >>>>>>>>>>>>> doc below.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> We encourage everyone to add their time preferences
> > > >> (during
> > > >>>>>>>>>>>> 04/15-04/30) in
> > > >>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
> > > >>>> that
> > > >>>>> works
> > > >>>>>>>> for
> > > >>>>>>>>>>>>> most.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>> Botong
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> > > >>>>> pkuhbt@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi Julian and Rui,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
> > > >>>> some
> > > >>>>>>> slides
> > > >>>>>>>>>>>> for the
> > > >>>>>>>>>>>>>> meeting.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
> > > >> free
> > > >>>> to
> > > >>>>> add
> > > >>>>>>>>>>>> more in
> > > >>>>>>>>>>>>>> here:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>> Botong
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > > >>>>>>>> jhyde.apache@gmail.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
> > > >>>>> idea. I
> > > >>>>>>>>>>>> think we
> > > >>>>>>>>>>>>>>> should create it to continue discussion after the first
> > > >>>>> meeting.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Julian
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > > >>>>>>>> jhyde.apache@gmail.com>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
> > > >>>> The
> > > >>>>> PR
> > > >>>>>>> will
> > > >>>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>> us to read the code, but I think we should do the first
> > > >>>>> round of
> > > >>>>>>>>>>>> questions
> > > >>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
> > > >>>>>>>>>>>> presentation of the
> > > >>>>>>>>>>>>>>> paper (do you have some slides you are planning to
> > > >>>> present
> > > >>>>> at
> > > >>>>>>>> VLDB,
> > > >>>>>>>>>>>>>>> Botong?) and then move on to questions about the
> > > >>>> concepts,
> > > >>>>> which
> > > >>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
> > > >>>> onto
> > > >>>>>>> other
> > > >>>>>>>>>>>> current
> > > >>>>>>>>>>>>>>> and future concepts in calcite.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
> > > >>>>> line-by-line
> > > >>>>>>> at
> > > >>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>> point. We need to understand the high-level concepts
> > > >> and
> > > >>>>> design
> > > >>>>>>>>>>>> choices. If
> > > >>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
> > > >>>> details.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
> > > >> doubt
> > > >>>>> that we
> > > >>>>>>>>>>>> will be
> > > >>>>>>>>>>>>>>> able to integrate everything, but we can build
> > > >>>> understanding
> > > >>>>>>> about
> > > >>>>>>>>>>>> where
> > > >>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
> > > >>>> of
> > > >>>>> code
> > > >>>>>>> to
> > > >>>>>>>>>>>> help us
> > > >>>>>>>>>>>>>>> get there.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> As I said before, after the integration I would like
> > > >>>>> people to
> > > >>>>>>> be
> > > >>>>>>>>>>>> able
> > > >>>>>>>>>>>>>>> to experiment with it and use it in their production
> > > >>>>> systems.
> > > >>>>>>>> That
> > > >>>>>>>>>>>> way, it
> > > >>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
> > > >> set
> > > >>>>>>>>>>>> integrates with
> > > >>>>>>>>>>>>>>> other calcite features and gets stronger over time.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Julian
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> > > >>>>> amaliujia@apache.org>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> For me to participate in the discussion for the
> > > >> above
> > > >>>>>>>> questions,
> > > >>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
> > > >>>>> likely
> > > >>>>>>> ask
> > > >>>>>>>>>>>> lots of
> > > >>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
> > > >>>>> questions
> > > >>>>>>>> and
> > > >>>>>>>>>>>> back
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> forward discussion.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> -Rui
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > > >>>>>>>> amaliujia@apache.org
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
> > > >>>>> (review
> > > >>>>>>>> code
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>> doc,
> > > >>>>>>>>>>>>>>>>>> etc.).
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> While you can share your code so people can have
> > > >> more
> > > >>>>> idea
> > > >>>>>>> how
> > > >>>>>>>>>>>> it is
> > > >>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
> > > >>>> doc
> > > >>>>> to
> > > >>>>>>>>>>>> discuss
> > > >>>>>>>>>>>>>>> open
> > > >>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
> > > >>>> here:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
> > > >>>>> solutions in
> > > >>>>>>>>>>>> Calcite
> > > >>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
> > > >>>> multi-query
> > > >>>>>>>>>>>> optimization
> > > >>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
> > > >>>> Spool
> > > >>>>>>>>>>>> operator),
> > > >>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
> > > >>>> models
> > > >>>>> -
> > > >>>>>>> one
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>> “view
> > > >>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
> > > >>>> the
> > > >>>>>>>>>>>> objectives of
> > > >>>>>>>>>>>>>>> each
> > > >>>>>>>>>>>>>>>>>> activity are so different?
> > > >>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
> > > >>>>>>> multi-objective
> > > >>>>>>>>>>>>>>> parametric
> > > >>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
> > > >>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> [1]:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> -Rui
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> > > >>>>> zinking3@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > > >>>>>>>>>>>> pkuhbt@gmail.com>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Hi Julian,
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
> > > >>>>> wondering
> > > >>>>>>> if
> > > >>>>>>>> it
> > > >>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>> help
> > > >>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>> Botong
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > > >>>>>>>> pkuhbt@gmail.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Hi Julian,
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
> > > >>>> plan
> > > >>>>>>> that
> > > >>>>>>>>>>>> best
> > > >>>>>>>>>>>>>>>>>>> benefits
> > > >>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
> > > >>>>> hopefully
> > > >>>>>>>>>>>> answer
> > > >>>>>>>>>>>>>>> your
> > > >>>>>>>>>>>>>>>>>>>>> questions.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
> > > >>>> time
> > > >>>>>>> points
> > > >>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>> consider
> > > >>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
> > > >>>>>>> preference
> > > >>>>>>>>>>>> over
> > > >>>>>>>>>>>>>>>>>>> time,
> > > >>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
> > > >>>> that
> > > >>>>>>>>>>>> minimizes the
> > > >>>>>>>>>>>>>>>>>>>> overall
> > > >>>>>>>>>>>>>>>>>>>>> cost function.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
> > > >>>> different
> > > >>>>> time
> > > >>>>>>>>>>>> points
> > > >>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
> > > >> identical
> > > >>>>> plans
> > > >>>>>>> in
> > > >>>>>>>>>>>> all
> > > >>>>>>>>>>>>>>> delta
> > > >>>>>>>>>>>>>>>>>>>> runs
> > > >>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
> > > >> the
> > > >>>>>>> Tempura
> > > >>>>>>>>>>>> paper,
> > > >>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
> > > >>>>> specifying
> > > >>>>>>> two
> > > >>>>>>>>>>>>>>> (logical)
> > > >>>>>>>>>>>>>>>>>>>> time
> > > >>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
> > > >> and
> > > >>>>> later
> > > >>>>>>>>>>>> delta
> > > >>>>>>>>>>>>>>> runs
> > > >>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
> > > >>>> supports
> > > >>>>>>> various
> > > >>>>>>>>>>>> form
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
> > > >>>>>>> append-only
> > > >>>>>>>>>>>> data
> > > >>>>>>>>>>>>>>>>>>> model in
> > > >>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
> > > >>>>> subsumes
> > > >>>>>>>> the
> > > >>>>>>>>>>>>>>> current
> > > >>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
> > > >>>> implementations.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
> > > >>>>> seperate
> > > >>>>>>>> cost
> > > >>>>>>>>>>>>>>> model,
> > > >>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
> > > >>>>>>> multi-objective
> > > >>>>>>>>>>>>>>>>>>>> optimization,
> > > >>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
> > > >>>> considered
> > > >>>>>>>>>>>> different
> > > >>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
> > > >>>> that
> > > >>>>>>>>>>>> converts this
> > > >>>>>>>>>>>>>>>>>>> cost
> > > >>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
> > > >>>> any
> > > >>>>> two
> > > >>>>>>>>>>>>>>> incremental
> > > >>>>>>>>>>>>>>>>>>>> plans
> > > >>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
> > > >>>> optimum.
> > > >>>>> I
> > > >>>>>>>> guess
> > > >>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>> go
> > > >>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
> > > >> query
> > > >>>>>>>>>>>> optimization
> > > >>>>>>>>>>>>>>>>>>> instead
> > > >>>>>>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>>>>> there is a need.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
> > > >>>>> optimization,
> > > >>>>>>>>>>>> since our
> > > >>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
> > > >>>> materializing
> > > >>>>>>>>>>>> intermediate
> > > >>>>>>>>>>>>>>>>>>>> results
> > > >>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
> > > >>>> problem of
> > > >>>>>>>>>>>> choosing
> > > >>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
> > > >> and
> > > >>>>>>> reusing
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
> > > >> plans.
> > > >>>> We
> > > >>>>>>>>>>>> borrowed the
> > > >>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
> > > >>>>> problem
> > > >>>>>>>> even
> > > >>>>>>>>>>>>>>> though
> > > >>>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
> > > >>>> think
> > > >>>>> our
> > > >>>>>>>> work
> > > >>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>> orthogonal
> > > >>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
> > > >>>>> views,
> > > >>>>>>>>>>>> lattice
> > > >>>>>>>>>>>>>>> etc.
> > > >>>>>>>>>>>>>>>>>>> We
> > > >>>>>>>>>>>>>>>>>>>> do
> > > >>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
> > > >>>> can
> > > >>>>> be
> > > >>>>>>>>>>>> adopted to
> > > >>>>>>>>>>>>>>>>>>> wider
> > > >>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
> > > >>>>>>> community.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
> > > >>>> java
> > > >>>>> code,
> > > >>>>>>>> it
> > > >>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>>>>> Botong
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > > >>>>>>>>>>>>>>> jhyde.apache@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Botong,
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
> > > >>>>> research,
> > > >>>>>>>> and
> > > >>>>>>>>>>>> thank
> > > >>>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
> > > >>>>> streaming,
> > > >>>>>>>>>>>>>>>>>>> materialized
> > > >>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
> > > >>>> As we
> > > >>>>>>> have
> > > >>>>>>>>>>>> already
> > > >>>>>>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
> > > >>>> relational
> > > >>>>>>>>>>>> operators,
> > > >>>>>>>>>>>>>>>>>>> lattice,
> > > >>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
> > > >> see
> > > >>>>>>> whether
> > > >>>>>>>>>>>> we can
> > > >>>>>>>>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>>>>>>>> them
> > > >>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
> > > >>>>> others.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
> > > >>>> your
> > > >>>>>>>>>>>> relations
> > > >>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>> used
> > > >>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
> > > >>>> streaming
> > > >>>>>>>>>>>> queries, the
> > > >>>>>>>>>>>>>>>>>>> only
> > > >>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
> > > >>>>> that you
> > > >>>>>>>>>>>> needed
> > > >>>>>>>>>>>>>>> two
> > > >>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
> > > >> maintenance”
> > > >>>> and
> > > >>>>>>>>>>>> another for
> > > >>>>>>>>>>>>>>>>>>> “user
> > > >>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
> > > >>>> are
> > > >>>>> so
> > > >>>>>>>>>>>> different?
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
> > > >>>> arrival of
> > > >>>>>>>>>>>>>>> multi-objective
> > > >>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
> > > >>>> and
> > > >>>>>>> digest
> > > >>>>>>>>>>>> your
> > > >>>>>>>>>>>>>>>>>>> paper.
> > > >>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
> > > >>>>> process
> > > >>>>>>> to
> > > >>>>>>>>>>>> create
> > > >>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
> > > >>>>> community.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
> > > >> this
> > > >>>>>>>>>>>> functionality
> > > >>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
> > > >>>>> experiment
> > > >>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
> > > >>>> setting up
> > > >>>>>>>> complex
> > > >>>>>>>>>>>>>>>>>>> databases
> > > >>>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
> > > >> simple
> > > >>>>> DDL
> > > >>>>>>>>>>>> operations
> > > >>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> > > >>>>> whether
> > > >>>>>>> we
> > > >>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>>> devise
> > > >>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Julian
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > > >>>>>>>> pkuhbt@gmail.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
> > > >>>>> figure,
> > > >>>>>>>> please
> > > >>>>>>>>>>>>>>> refer
> > > >>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>> Fig
> > > >>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>>>>>> Botong
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > > >>>>>>>>>>>> taojiatao@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
> > > >>>> the
> > > >>>>> mail,
> > > >>>>>>>>>>>> may you
> > > >>>>>>>>>>>>>>>>>>> open
> > > >>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>> JIRA
> > > >>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
> > > >> can
> > > >>>>>>> subscribe
> > > >>>>>>>>>>>> to the
> > > >>>>>>>>>>>>>>>>>>>> JIRA?
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Regards!
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
> > > >> 于2020年12月24日周四
> > > >>>>>>>> 上午3:18写道:
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
> > > >>>> optimizer
> > > >>>>>>> into
> > > >>>>>>>> a
> > > >>>>>>>>>>>>>>> general
> > > >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
> > > >>>> research
> > > >>>>>>> paper
> > > >>>>>>>>>>>>>>>>>>> published
> > > >>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>> VLDB
> > > >>>>>>>>>>>>>>>>>>>>>>>>> 2021:
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
> > > >>>> framework
> > > >>>>> for
> > > >>>>>>>>>>>>>>> incremental
> > > >>>>>>>>>>>>>>>>>>>> data
> > > >>>>>>>>>>>>>>>>>>>>>>>>> processing
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
> > > >> illustrating
> > > >>>>> how
> > > >>>>>>>>>>>> Alibaba’s
> > > >>>>>>>>>>>>>>>>>>> data
> > > >>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
> > > >>>>> query
> > > >>>>>>>>>>>> optimizer
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>> alleviate
> > > >>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> > > >>>>> Resource-Aware
> > > >>>>>>>>>>>>>>> Incremental
> > > >>>>>>>>>>>>>>>>>>>>>>>> Computing
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
> > > >>>> general
> > > >>>>>>>>>>>> cost-based
> > > >>>>>>>>>>>>>>>>>>>>>> incremental
> > > >>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
> > > >>>>> multiple
> > > >>>>>>>>>>>> families
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
> > > >>>>>>> Streaming,
> > > >>>>>>>>>>>>>>>>>>> DBToaster,
> > > >>>>>>>>>>>>>>>>>>>>>> etc.
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
> > > >>>>> generated
> > > >>>>>>> best
> > > >>>>>>>>>>>> plan
> > > >>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
> > > >>>> each
> > > >>>>>>>>>>>> individual
> > > >>>>>>>>>>>>>>>>>>> method
> > > >>>>>>>>>>>>>>>>>>>>>>>> alone.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
> > > >>>> central
> > > >>>>> to
> > > >>>>>>>>>>>> database
> > > >>>>>>>>>>>>>>>>>>> view
> > > >>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
> > > >> and
> > > >>>> are
> > > >>>>>>> being
> > > >>>>>>>>>>>>>>> adopted
> > > >>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>> active
> > > >>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
> > > >>>> approximate
> > > >>>>>>> query
> > > >>>>>>>>>>>>>>>>>>> processing,
> > > >>>>>>>>>>>>>>>>>>>>>> etc.
> > > >>>>>>>>>>>>>>>>>>>>>>>> We
> > > >>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
> > > >> widening
> > > >>>> the
> > > >>>>>>>>>>>> spectrum of
> > > >>>>>>>>>>>>>>>>>>>>>> Calcite,
> > > >>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
> > > >> Calcite.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
> > > >>>>> details.
> > > >>>>>>>>>>>> Please
> > > >>>>>>>>>>>>>>>>>>> refer
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
> > > >>>>> working
> > > >>>>>>> on a
> > > >>>>>>>>>>>>>>> journal
> > > >>>>>>>>>>>>>>>>>>>>>> version
> > > >>>>>>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
> > > >>>> is
> > > >>>>> meant
> > > >>>>>>>> to
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>> executed
> > > >>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
> > > >> Calcite’s
> > > >>>>> memo
> > > >>>>>>> will
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>> extended
> > > >>>>>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
> > > >>>>>>> generating
> > > >>>>>>>>>>>>>>>>>>> incremental
> > > >>>>>>>>>>>>>>>>>>>>>>>> plans
> > > >>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
> > > >>>>>>> different
> > > >>>>>>>>>>>> time
> > > >>>>>>>>>>>>>>>>>>> points.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
> > > >> that
> > > >>>>>>> changes
> > > >>>>>>>>>>>> over
> > > >>>>>>>>>>>>>>> time
> > > >>>>>>>>>>>>>>>>>>>>>> (Time
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> > > >>>>>>> introduced
> > > >>>>>>>>>>>>>>>>>>> TvrMetaSet
> > > >>>>>>>>>>>>>>>>>>>>>> into
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
> > > >> to
> > > >>>>> track
> > > >>>>>>>>>>>> related
> > > >>>>>>>>>>>>>>>>>>> RelSets
> > > >>>>>>>>>>>>>>>>>>>>>> of a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> > > >>>>> certain
> > > >>>>>>>>>>>> time,
> > > >>>>>>>>>>>>>>>>>>> delta of
> > > >>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
> > > >> vertical
> > > >>>>> line
> > > >>>>>>> is a
> > > >>>>>>>>>>>>>>>>>>> TvrMetaSet
> > > >>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
> > > >> R,
> > > >>>>> etc.).
> > > >>>>>>>>>>>>>>> Horizontal
> > > >>>>>>>>>>>>>>>>>>>> lines
> > > >>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
> > > >> is a
> > > >>>>>>> RelSet.
> > > >>>>>>>>>>>> Users
> > > >>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>>> write
> > > >>>>>>>>>>>>>>>>>>>>>>>> TVR
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
> > > >> transformations
> > > >>>>>>> between
> > > >>>>>>>>>>>> these
> > > >>>>>>>>>>>>>>>>>>> dots.
> > > >>>>>>>>>>>>>>>>>>>>>> For
> > > >>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> > > >>>> that
> > > >>>>>>>>>>>> describe how
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>> compute
> > > >>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
> > > >>>>> TVRs.
> > > >>>>>>> The
> > > >>>>>>>>>>>> red
> > > >>>>>>>>>>>>>>> lines
> > > >>>>>>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>>>>>>>> intra-TVR rules that describe transformations
> > > >>>>> within a
> > > >>>>>>>>>>>> TVR. All
> > > >>>>>>>>>>>>>>>>>>> TVR
> > > >>>>>>>>>>>>>>>>>>>>>>>> rewrite
> > > >>>>>>>>>>>>>>>>>>>>>>>>> rules are logical rules. All existing Calcite
> > > >>>>> rules
> > > >>>>>>>> still
> > > >>>>>>>>>>>> work
> > > >>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>>>>>>>>>> volcano system without modification.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> All changes in this feature will consist of
> > > >>>> four
> > > >>>>>>> parts:
> > > >>>>>>>>>>>>>>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > > >>>>>>>>>>>>>>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
> > > >>>>> TvrMetaSet
> > > >>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> RelNodes,
> > > >>>>>>>>>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>>>>>>>>>> well as links in between the nodes.
> > > >>>>>>>>>>>>>>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
> > > >>>>> upgraded
> > > >>>>>>>>>>>> rule
> > > >>>>>>>>>>>>>>>>>>> engine
> > > >>>>>>>>>>>>>>>>>>>>>> API.
> > > >>>>>>>>>>>>>>>>>>>>>>>>> 4. Multi-query optimization, used to find the
> > > >>>> best
> > > >>>>>>>>>>>> incremental
> > > >>>>>>>>>>>>>>>>>>> plan
> > > >>>>>>>>>>>>>>>>>>>>>>>>> involving multiple time points.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Note that this feature is an extension in
> > > >>>> nature
> > > >>>>> and
> > > >>>>>>>> thus
> > > >>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>>>>>>>>> disabled,
> > > >>>>>>>>>>>>>>>>>>>>>>>>> does not change any existing Calcite
> > > >> behavior.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Other than scenarios in the paper, we also
> > > >>>> applied
> > > >>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>> Calcite-extended
> > > >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer to a type of
> > > >>>> periodic
> > > >>>>>>> query
> > > >>>>>>>>>>>> called
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> ‘‘range
> > > >>>>>>>>>>>>>>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
> > > >>>> achieved
> > > >>>>> cost
> > > >>>>>>>>>>>> savings
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>> 80%
> > > >>>>>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> > > >>>>>>> end-to-end
> > > >>>>>>>>>>>>>>> execution
> > > >>>>>>>>>>>>>>>>>>>>>> time.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> All comments and suggestions are welcome.
> > > >>>> Thanks
> > > >>>>> and
> > > >>>>>>>> happy
> > > >>>>>>>>>>>>>>>>>>> holidays!
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Botong
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~
> > > >>>>>>>>>>>>>>>>>>> no mistakes
> > > >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Viliam Durina
> > > >>>>>>> Jet Developer
> > > >>>>>>>      hazelcast®
> > > >>>>>>>
> > > >>>>>>>  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo,
> > > >> CA
> > > >>>>> 94402 |
> > > >>>>>>> USA
> > > >>>>>>> +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
> > > >> https://www.hazelcast.com>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> This message contains confidential information and is intended
> > > >> only
> > > >>>> for
> > > >>>>>>> the
> > > >>>>>>> individuals named. If you are not the named addressee you
> should
> > > >> not
> > > >>>>>>> disseminate, distribute or copy this e-mail. Please notify the
> > > >>>> sender
> > > >>>>>>> immediately by e-mail if you have received this e-mail by
> mistake
> > > >>>> and
> > > >>>>>>> delete this e-mail from your system. E-mail transmission
> cannot be
> > > >>>>>>> guaranteed to be secure or error-free as information could be
> > > >>>>> intercepted,
> > > >>>>>>> corrupted, lost, destroyed, arrive late or incomplete, or
> contain
> > > >>>>> viruses.
> > > >>>>>>> The sender therefore does not accept liability for any errors
> or
> > > >>>>> omissions
> > > >>>>>>> in the contents of this message, which arise as a result of
> e-mail
> > > >>>>>>> transmission. If verification is required, please request a
> > > >>>> hard-copy
> > > >>>>>>> version. -Hazelcast
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Haisheng Yuan <hy...@apache.org>.
Hi Botong,

We haven't heard from you for a while.
Feel free to reach out if you get stuck or need help on rebasing code.

Thanks,
Haisheng

On 2021/05/15 00:54:02, Botong Huang <pk...@gmail.com> wrote: 
> Hi all,
> 
> Thank you all for the interest, and thanks Julian for the update!
> 
> I am having problems uploading the pdf files into the jira CALCITE-4568
> <https://issues.apache.org/jira/browse/CALCITE-4568>, so I attached the
> slides in our code base:
> https://github.com/alibaba/cost-based-incremental-optimizer/blob/main/Tempura_Calcite_presentation.pdf
> 
> The slides contain a walking example of how Tempura expands its memo. The
> current version of the code also has two e2e unit tests at
> TvrOptimizationTest.java and TvrExecutionTest.java. Please feel free to
> start playing with them, and feel free to reach out and possibly schedule
> another meeting if needed.
> 
> As agreed in the meeting, we will rebase our code to a newer version of
> Calcite.
> 
> Best,
> Botong
> 
> On Thu, May 13, 2021 at 12:47 PM Julian Hyde <jh...@gmail.com> wrote:
> 
> > During the meeting we agreed to start progressing this contribution in the
> > usual Apache Way, with conversations on the dev list and in the
> > https://issues.apache.org/jira/browse/CALCITE-4568 <
> > https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
> > should be easy for you to participate.
> >
> > Botong said he would share the slides. (He might be unwilling to make them
> > public, because they are his presentation for a conference that has not
> > happened yet. Reach out to him one-to-one.)
> >
> > Next step is for someone on the Alibaba side to create a PR that is
> > rebased on the latest Calcite master, and add a comment to the JIRA case.
> > Then we can discuss what needs to be done for that PR. Code quality, adding
> > comments, breaking up into smaller commits, additional tests, renaming
> > packages/classes, restructuring into plugins are all possibilities.
> >
> > Our side of the bargain, as committers, is that we should review in a
> > timely manner, and not move the goal posts — if the contributors make the
> > changes we request then we will land this code in master in a reasonable
> > amount of time.
> >
> > We also discussed incremental view maintenance (IVM). Tempura solves a
> > more general problem (finding the optimal K steps to maintain a
> > materialized view as data arrives in K points in time) but if we set K=2,
> > we can generate a plan for how to update a materialized view given a delta
> > table. The plan will be different based on cost - e.g. whether the delta
> > table is small or large. This is a problem that many of our users would
> > like to solve. It will exercise much of Tempura’s code base, and encourage
> > contributions.
> >
> > In my opinion, we should do IVM at launch. It should be the main example
> > we use in conference talks, blog posts, etc. When people understand that
> > case, we can explain how we generalize from K=2 to arbitrary K.
> >
> > Julian
> >
> >
> > > On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
> > >
> > > I apologize that I had a wrong impression on the meeting time (I thought
> > it
> > > should be on Thursday but it is Wednesday). I can follow up your meeting
> > > records if you have any.
> > >
> > >
> > > -Rui
> > >
> > > On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:
> > >
> > >> Hi all,
> > >>
> > >> This is a reminder that we are going to have our second discussion
> > meeting
> > >> tomorrow at 10-11pm PST. Please find the link below, everyone is
> > welcome to
> > >> join!
> > >>
> > >> Join Zoom Meeting
> > >> https://uci.zoom.us/j/91986206610
> > >> <
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
> > >>>
> > >>
> > >> Meeting ID: 919 8620 6610
> > >> One tap mobile
> > >> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
> > >> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
> > >>
> > >> Dial by your location
> > >>        +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> > >>        +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> > >>        +1 346 248 7799 <(346)%20248-7799> US (Houston)
> > >>        +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> > >>        +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> > >>        +1 646 558 8656 <(646)%20558-8656> US (New York)
> > >> Meeting ID: 919 8620 6610
> > >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> > >> <
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
> > >>>
> > >>
> > >> Join by Skype for Business
> > >> https://uci.zoom.us/skype/91986206610
> > >> <
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> > >>>
> > >>
> > >> Thanks,
> > >> Botong
> > >>
> > >> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
> > >>
> > >>> Hi Stamatis and all,
> > >>>
> > >>> Thanks for the interest! Let's tentatively schedule the next meeting
> > next
> > >>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's
> > >> new
> > >>> needs showing up.
> > >>>
> > >>> Best,
> > >>> Botong
> > >>>
> > >>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hello,
> > >>>>
> > >>>> I really regret missing the first meeting, sorry about that. I added
> > my
> > >>>> preferences in the document.
> > >>>> I will make sure to attend the next one and help as much as I can.
> > >>>>
> > >>>> I didn't have the chance yet to go over the paper but will try to do
> > it
> > >>>> before the next meeting.
> > >>>>
> > >>>> For me the following dates are more convenient than others so it would
> > >> be
> > >>>> nice if we could arrange it then.
> > >>>>
> > >>>> Thu, May 6, 10pm PST
> > >>>> Tue, May 12, 10pm PST
> > >>>>
> > >>>> Best,
> > >>>> Stamatis
> > >>>>
> > >>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
> > >>>>
> > >>>>> I have added my time preferences to the doc [1]. I am generally
> > >>>>> available any evening Mon - Thu. How about we meet Monday 10th May?
> > >>>>>
> > >>>>> Stamatis, Jesus, Given the complexity of this work, I would very much
> > >>>>> appreciate your insight, as experts in optimizer theory. Could one of
> > >>>>> you join the next meeting? Of course we should choose a time that
> > >>>>> works for everyone's schedule.
> > >>>>>
> > >>>>> Julian
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > >>>>
> > >>
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>
> > >>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
> > >> wrote:
> > >>>>>>
> > >>>>>> We didn't record it, we will try to record the following meetings.
> > >>>> Please
> > >>>>>> add your time preference in the docs, so that we can find a meeting
> > >>>> time
> > >>>>>> that works for more people.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Botong
> > >>>>>>
> > >>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
> > >> viliam@hazelcast.com>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> Is there a recording available?
> > >>>>>>> Viliam
> > >>>>>>>
> > >>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> The meeting yesterday was fun and productive. As discussed, this
> > >>>> is
> > >>>>> the
> > >>>>>>>> call to schedule our second meeting.
> > >>>>>>>>
> > >>>>>>>> We encourage everyone to add their time preferences during
> > >> 05/01 -
> > >>>>> 05/15
> > >>>>>>>> here:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Botong
> > >>>>>>>>
> > >>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi all,
> > >>>>>>>>> We've created a zoom meeting below for our meeting next Monday
> > >>>>>>>>> (9pm-10:30pm PST on 04/26).
> > >>>>>>>>> Talk to you all soon!
> > >>>>>>>>>
> > >>>>>>>>> Join Zoom Meeting
> > >>>>>>>>> https://uci.zoom.us/j/91279732686
> > >>>>>>>>> <
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Meeting ID: 912 7973 2686
> > >>>>>>>>> One tap mobile
> > >>>>>>>>> +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
> > >>>>>>>>> +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
> > >>>>>>>>>
> > >>>>>>>>> Dial by your location
> > >>>>>>>>> +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> > >>>>>>>>> +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> > >>>>>>>>> +1 346 248 7799 <(346)%20248-7799> US (Houston)
> > >>>>>>>>> +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> > >>>>>>>>> +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> > >>>>>>>>> +1 646 558 8656 <(646)%20558-8656> US (New York)
> > >>>>>>>>> Meeting ID: 912 7973 2686
> > >>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > >>>>>>>>> <
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Join by Skype for Business
> > >>>>>>>>> https://uci.zoom.us/skype/91279732686
> > >>>>>>>>> <
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Botong
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
> > >> pkuhbt@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi all,
> > >>>>>>>>>>
> > >>>>>>>>>> According to the preferences collected, we are tentatively
> > >>>>> scheduling
> > >>>>>>>> our
> > >>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > >>>>>>>>>>
> > >>>>>>>>>> We will give a presentation about Tempura, followed by a free
> > >>>>>>>> discussion.
> > >>>>>>>>>>
> > >>>>>>>>>> Please let us know if there are new other requests. Few days
> > >>>>> before
> > >>>>>>>>>> the meeting, I will send out a zoom meeting link.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> Botong
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
> > >> pkuhbt@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Julian and all,
> > >>>>>>>>>>>
> > >>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
> > >> a
> > >>>>> quick
> > >>>>>>>> peek
> > >>>>>>>>>>> at the last five commits.
> > >>>>>>>>>>>
> > >>>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > >>>>>>>>>>>
> > >>>>>>>>>>> I've also opened a Jira (CALCITE-4568
> > >>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
> > >> which
> > >>>>> will
> > >>>>>>>> serve
> > >>>>>>>>>>> as the umbrella Jira for the feature.
> > >>>>>>>>>>>
> > >>>>>>>>>>> In the meantime, we encourage everyone to enter the time
> > >>>>> preferences
> > >>>>>>>> for
> > >>>>>>>>>>> our first meeting here:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>> Botong
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> > >>>>> jhyde.apache@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I have added my time preferences to the doc.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Initial discussions will need to be about architecture and
> > >>>>>>> high-level
> > >>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
> > >> PR
> > >>>>>>>> line-by-line
> > >>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
> > >>>>> design
> > >>>>>>>>>>>> holistically, and prepare questions/comments before the
> > >>>> meeting.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
> > >>>> task?
> > >>>>>>> JIRA
> > >>>>>>>>>>>> how we track long-running tasks such as this.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Julian
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
> > >> pkuhbt@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
> > >> our
> > >>>>> code
> > >>>>>>>> base
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
> > >>>> peek.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
> > >>>> for a
> > >>>>> Zoom
> > >>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
> > >> Calcite.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Since some of our team members are in China, we prefer
> > >> the
> > >>>>> time
> > >>>>>>> slot
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
> > >> preference
> > >>>> in
> > >>>>> the
> > >>>>>>>>>>>> shared
> > >>>>>>>>>>>>> doc below.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> We encourage everyone to add their time preferences
> > >> (during
> > >>>>>>>>>>>> 04/15-04/30) in
> > >>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
> > >>>> that
> > >>>>> works
> > >>>>>>>> for
> > >>>>>>>>>>>>> most.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> > >>>>> pkuhbt@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi Julian and Rui,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
> > >>>> some
> > >>>>>>> slides
> > >>>>>>>>>>>> for the
> > >>>>>>>>>>>>>> meeting.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
> > >> free
> > >>>> to
> > >>>>> add
> > >>>>>>>>>>>> more in
> > >>>>>>>>>>>>>> here:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > >>>>>>>> jhyde.apache@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
> > >>>>> idea. I
> > >>>>>>>>>>>> think we
> > >>>>>>>>>>>>>>> should create it to continue discussion after the first
> > >>>>> meeting.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Julian
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > >>>>>>>> jhyde.apache@gmail.com>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
> > >>>> The
> > >>>>> PR
> > >>>>>>> will
> > >>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>> us to read the code, but I think we should do the first
> > >>>>> round of
> > >>>>>>>>>>>> questions
> > >>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
> > >>>>>>>>>>>> presentation of the
> > >>>>>>>>>>>>>>> paper (do you have some slides you are planning to
> > >>>> present
> > >>>>> at
> > >>>>>>>> VLDB,
> > >>>>>>>>>>>>>>> Botong?) and then move on to questions about the
> > >>>> concepts,
> > >>>>> which
> > >>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
> > >>>> onto
> > >>>>>>> other
> > >>>>>>>>>>>> current
> > >>>>>>>>>>>>>>> and future concepts in calcite.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
> > >>>>> line-by-line
> > >>>>>>> at
> > >>>>>>>>>>>> this
> > >>>>>>>>>>>>>>> point. We need to understand the high-level concepts
> > >> and
> > >>>>> design
> > >>>>>>>>>>>> choices. If
> > >>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
> > >>>> details.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
> > >> doubt
> > >>>>> that we
> > >>>>>>>>>>>> will be
> > >>>>>>>>>>>>>>> able to integrate everything, but we can build
> > >>>> understanding
> > >>>>>>> about
> > >>>>>>>>>>>> where
> > >>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
> > >>>> of
> > >>>>> code
> > >>>>>>> to
> > >>>>>>>>>>>> help us
> > >>>>>>>>>>>>>>> get there.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> As I said before, after the integration I would like
> > >>>>> people to
> > >>>>>>> be
> > >>>>>>>>>>>> able
> > >>>>>>>>>>>>>>> to experiment with it and use it in their production
> > >>>>> systems.
> > >>>>>>>> That
> > >>>>>>>>>>>> way, it
> > >>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
> > >> set
> > >>>>>>>>>>>> integrates with
> > >>>>>>>>>>>>>>> other calcite features and gets stronger over time.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Julian
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> > >>>>> amaliujia@apache.org>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> For me to participate in the discussion for the
> > >> above
> > >>>>>>>> questions,
> > >>>>>>>>>>>> I
> > >>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
> > >>>>> likely
> > >>>>>>> ask
> > >>>>>>>>>>>> lots of
> > >>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
> > >>>>> questions
> > >>>>>>>> and
> > >>>>>>>>>>>> back
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> forward discussion.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> -Rui
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > >>>>>>>> amaliujia@apache.org
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
> > >>>>> (review
> > >>>>>>>> code
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>>> doc,
> > >>>>>>>>>>>>>>>>>> etc.).
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> While you can share your code so people can have
> > >> more
> > >>>>> idea
> > >>>>>>> how
> > >>>>>>>>>>>> it is
> > >>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
> > >>>> doc
> > >>>>> to
> > >>>>>>>>>>>> discuss
> > >>>>>>>>>>>>>>> open
> > >>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
> > >>>> here:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
> > >>>>> solutions in
> > >>>>>>>>>>>> Calcite
> > >>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
> > >>>> multi-query
> > >>>>>>>>>>>> optimization
> > >>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
> > >>>> Spool
> > >>>>>>>>>>>> operator),
> > >>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
> > >>>> models
> > >>>>> -
> > >>>>>>> one
> > >>>>>>>>>>>> for
> > >>>>>>>>>>>>>>> “view
> > >>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
> > >>>> the
> > >>>>>>>>>>>> objectives of
> > >>>>>>>>>>>>>>> each
> > >>>>>>>>>>>>>>>>>> activity are so different?
> > >>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
> > >>>>>>> multi-objective
> > >>>>>>>>>>>>>>> parametric
> > >>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
> > >>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> [1]:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> -Rui
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> > >>>>> zinking3@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > >>>>>>>>>>>> pkuhbt@gmail.com>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Hi Julian,
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
> > >>>>> wondering
> > >>>>>>> if
> > >>>>>>>> it
> > >>>>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>>>>> help
> > >>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > >>>>>>>> pkuhbt@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Hi Julian,
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
> > >>>> plan
> > >>>>>>> that
> > >>>>>>>>>>>> best
> > >>>>>>>>>>>>>>>>>>> benefits
> > >>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
> > >>>>> hopefully
> > >>>>>>>>>>>> answer
> > >>>>>>>>>>>>>>> your
> > >>>>>>>>>>>>>>>>>>>>> questions.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
> > >>>> time
> > >>>>>>> points
> > >>>>>>>> to
> > >>>>>>>>>>>>>>>>>>> consider
> > >>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
> > >>>>>>> preference
> > >>>>>>>>>>>> over
> > >>>>>>>>>>>>>>>>>>> time,
> > >>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
> > >>>> that
> > >>>>>>>>>>>> minimizes the
> > >>>>>>>>>>>>>>>>>>>> overall
> > >>>>>>>>>>>>>>>>>>>>> cost function.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
> > >>>> different
> > >>>>> time
> > >>>>>>>>>>>> points
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
> > >> identical
> > >>>>> plans
> > >>>>>>> in
> > >>>>>>>>>>>> all
> > >>>>>>>>>>>>>>> delta
> > >>>>>>>>>>>>>>>>>>>> runs
> > >>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
> > >> the
> > >>>>>>> Tempura
> > >>>>>>>>>>>> paper,
> > >>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
> > >>>>> specifying
> > >>>>>>> two
> > >>>>>>>>>>>>>>> (logical)
> > >>>>>>>>>>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
> > >> and
> > >>>>> later
> > >>>>>>>>>>>> delta
> > >>>>>>>>>>>>>>> runs
> > >>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
> > >>>> supports
> > >>>>>>> various
> > >>>>>>>>>>>> form
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
> > >>>>>>> append-only
> > >>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>> model in
> > >>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
> > >>>>> subsumes
> > >>>>>>>> the
> > >>>>>>>>>>>>>>> current
> > >>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
> > >>>> implementations.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
> > >>>>> seperate
> > >>>>>>>> cost
> > >>>>>>>>>>>>>>> model,
> > >>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
> > >>>>>>> multi-objective
> > >>>>>>>>>>>>>>>>>>>> optimization,
> > >>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
> > >>>> considered
> > >>>>>>>>>>>> different
> > >>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
> > >>>> that
> > >>>>>>>>>>>> converts this
> > >>>>>>>>>>>>>>>>>>> cost
> > >>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
> > >>>> any
> > >>>>> two
> > >>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>> plans
> > >>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
> > >>>> optimum.
> > >>>>> I
> > >>>>>>>> guess
> > >>>>>>>>>>>> we
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>> go
> > >>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
> > >> query
> > >>>>>>>>>>>> optimization
> > >>>>>>>>>>>>>>>>>>> instead
> > >>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>> there is a need.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
> > >>>>> optimization,
> > >>>>>>>>>>>> since our
> > >>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
> > >>>> materializing
> > >>>>>>>>>>>> intermediate
> > >>>>>>>>>>>>>>>>>>>> results
> > >>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
> > >>>> problem of
> > >>>>>>>>>>>> choosing
> > >>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
> > >> and
> > >>>>>>> reusing
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
> > >> plans.
> > >>>> We
> > >>>>>>>>>>>> borrowed the
> > >>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
> > >>>>> problem
> > >>>>>>>> even
> > >>>>>>>>>>>>>>> though
> > >>>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
> > >>>> think
> > >>>>> our
> > >>>>>>>> work
> > >>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>> orthogonal
> > >>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
> > >>>>> views,
> > >>>>>>>>>>>> lattice
> > >>>>>>>>>>>>>>> etc.
> > >>>>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
> > >>>> can
> > >>>>> be
> > >>>>>>>>>>>> adopted to
> > >>>>>>>>>>>>>>>>>>> wider
> > >>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
> > >>>>>>> community.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
> > >>>> java
> > >>>>> code,
> > >>>>>>>> it
> > >>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > >>>>>>>>>>>>>>> jhyde.apache@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Botong,
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
> > >>>>> research,
> > >>>>>>>> and
> > >>>>>>>>>>>> thank
> > >>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
> > >>>>> streaming,
> > >>>>>>>>>>>>>>>>>>> materialized
> > >>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
> > >>>> As we
> > >>>>>>> have
> > >>>>>>>>>>>> already
> > >>>>>>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
> > >>>> relational
> > >>>>>>>>>>>> operators,
> > >>>>>>>>>>>>>>>>>>> lattice,
> > >>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
> > >> see
> > >>>>>>> whether
> > >>>>>>>>>>>> we can
> > >>>>>>>>>>>>>>>>>>> make
> > >>>>>>>>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
> > >>>>> others.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
> > >>>> your
> > >>>>>>>>>>>> relations
> > >>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>> used
> > >>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
> > >>>> streaming
> > >>>>>>>>>>>> queries, the
> > >>>>>>>>>>>>>>>>>>> only
> > >>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
> > >>>>> that you
> > >>>>>>>>>>>> needed
> > >>>>>>>>>>>>>>> two
> > >>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
> > >> maintenance”
> > >>>> and
> > >>>>>>>>>>>> another for
> > >>>>>>>>>>>>>>>>>>> “user
> > >>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
> > >>>> are
> > >>>>> so
> > >>>>>>>>>>>> different?
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
> > >>>> arrival of
> > >>>>>>>>>>>>>>> multi-objective
> > >>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
> > >>>> and
> > >>>>>>> digest
> > >>>>>>>>>>>> your
> > >>>>>>>>>>>>>>>>>>> paper.
> > >>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
> > >>>>> process
> > >>>>>>> to
> > >>>>>>>>>>>> create
> > >>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
> > >>>>> community.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
> > >> this
> > >>>>>>>>>>>> functionality
> > >>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
> > >>>>> experiment
> > >>>>>>>>>>>> with
> > >>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
> > >>>> setting up
> > >>>>>>>> complex
> > >>>>>>>>>>>>>>>>>>> databases
> > >>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
> > >> simple
> > >>>>> DDL
> > >>>>>>>>>>>> operations
> > >>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> > >>>>> whether
> > >>>>>>> we
> > >>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>>> devise
> > >>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Julian
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > >>>>>>>> pkuhbt@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
> > >>>>> figure,
> > >>>>>>>> please
> > >>>>>>>>>>>>>>> refer
> > >>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> Fig
> > >>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > >>>>>>>>>>>> taojiatao@gmail.com>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
> > >>>> the
> > >>>>> mail,
> > >>>>>>>>>>>> may you
> > >>>>>>>>>>>>>>>>>>> open
> > >>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>> JIRA
> > >>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
> > >> can
> > >>>>>>> subscribe
> > >>>>>>>>>>>> to the
> > >>>>>>>>>>>>>>>>>>>> JIRA?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Regards!
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
> > >> 于2020年12月24日周四
> > >>>>>>>> 上午3:18写道:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
> > >>>> optimizer
> > >>>>>>> into
> > >>>>>>>> a
> > >>>>>>>>>>>>>>> general
> > >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
> > >>>> research
> > >>>>>>> paper
> > >>>>>>>>>>>>>>>>>>> published
> > >>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>> VLDB
> > >>>>>>>>>>>>>>>>>>>>>>>>> 2021:
> > >>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
> > >>>> framework
> > >>>>> for
> > >>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>>>>>>>> processing
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
> > >> illustrating
> > >>>>> how
> > >>>>>>>>>>>> Alibaba’s
> > >>>>>>>>>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
> > >>>>> query
> > >>>>>>>>>>>> optimizer
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>> alleviate
> > >>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
> > >>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> > >>>>> Resource-Aware
> > >>>>>>>>>>>>>>> Incremental
> > >>>>>>>>>>>>>>>>>>>>>>>> Computing
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
> > >>>> general
> > >>>>>>>>>>>> cost-based
> > >>>>>>>>>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
> > >>>>> multiple
> > >>>>>>>>>>>> families
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
> > >>>>>>> Streaming,
> > >>>>>>>>>>>>>>>>>>> DBToaster,
> > >>>>>>>>>>>>>>>>>>>>>> etc.
> > >>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
> > >>>>> generated
> > >>>>>>> best
> > >>>>>>>>>>>> plan
> > >>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
> > >>>> each
> > >>>>>>>>>>>> individual
> > >>>>>>>>>>>>>>>>>>> method
> > >>>>>>>>>>>>>>>>>>>>>>>> alone.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
> > >>>> central
> > >>>>> to
> > >>>>>>>>>>>> database
> > >>>>>>>>>>>>>>>>>>> view
> > >>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
> > >> and
> > >>>> are
> > >>>>>>> being
> > >>>>>>>>>>>>>>> adopted
> > >>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>> active
> > >>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
> > >>>> approximate
> > >>>>>>> query
> > >>>>>>>>>>>>>>>>>>> processing,
> > >>>>>>>>>>>>>>>>>>>>>> etc.
> > >>>>>>>>>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
> > >> widening
> > >>>> the
> > >>>>>>>>>>>> spectrum of
> > >>>>>>>>>>>>>>>>>>>>>> Calcite,
> > >>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
> > >> Calcite.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
> > >>>>> details.
> > >>>>>>>>>>>> Please
> > >>>>>>>>>>>>>>>>>>> refer
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
> > >>>>> working
> > >>>>>>> on a
> > >>>>>>>>>>>>>>> journal
> > >>>>>>>>>>>>>>>>>>>>>> version
> > >>>>>>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
> > >>>> is
> > >>>>> meant
> > >>>>>>>> to
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>> executed
> > >>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
> > >> Calcite’s
> > >>>>> memo
> > >>>>>>> will
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>> extended
> > >>>>>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
> > >>>>>>> generating
> > >>>>>>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>>>>>> plans
> > >>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
> > >>>>>>> different
> > >>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>>> points.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
> > >> that
> > >>>>>>> changes
> > >>>>>>>>>>>> over
> > >>>>>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>>>>>> (Time
> > >>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> > >>>>>>> introduced
> > >>>>>>>>>>>>>>>>>>> TvrMetaSet
> > >>>>>>>>>>>>>>>>>>>>>> into
> > >>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
> > >> to
> > >>>>> track
> > >>>>>>>>>>>> related
> > >>>>>>>>>>>>>>>>>>> RelSets
> > >>>>>>>>>>>>>>>>>>>>>> of a
> > >>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> > >>>>> certain
> > >>>>>>>>>>>> time,
> > >>>>>>>>>>>>>>>>>>> delta of
> > >>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
> > >> vertical
> > >>>>> line
> > >>>>>>> is a
> > >>>>>>>>>>>>>>>>>>> TvrMetaSet
> > >>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
> > >> R,
> > >>>>> etc.).
> > >>>>>>>>>>>>>>> Horizontal
> > >>>>>>>>>>>>>>>>>>>> lines
> > >>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
> > >> is a
> > >>>>>>> RelSet.
> > >>>>>>>>>>>> Users
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>>> write
> > >>>>>>>>>>>>>>>>>>>>>>>> TVR
> > >>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
> > >> transformations
> > >>>>>>> between
> > >>>>>>>>>>>> these
> > >>>>>>>>>>>>>>>>>>> dots.
> > >>>>>>>>>>>>>>>>>>>>>> For
> > >>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> > >>>> that
> > >>>>>>>>>>>> describe how
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> compute
> > >>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
> > >>>>> TVRs.
> > >>>>>>> The
> > >>>>>>>>>>>> red
> > >>>>>>>>>>>>>>> lines
> > >>>>>>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>>>>>> intra-TVR rules that describe transformations
> > >>>>> within a
> > >>>>>>>>>>>> TVR. All
> > >>>>>>>>>>>>>>>>>>> TVR
> > >>>>>>>>>>>>>>>>>>>>>>>> rewrite
> > >>>>>>>>>>>>>>>>>>>>>>>>> rules are logical rules. All existing Calcite
> > >>>>> rules
> > >>>>>>>> still
> > >>>>>>>>>>>> work
> > >>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> new
> > >>>>>>>>>>>>>>>>>>>>>>>>> volcano system without modification.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> All changes in this feature will consist of
> > >>>> four
> > >>>>>>> parts:
> > >>>>>>>>>>>>>>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > >>>>>>>>>>>>>>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
> > >>>>> TvrMetaSet
> > >>>>>>>> and
> > >>>>>>>>>>>>>>>>>>> RelNodes,
> > >>>>>>>>>>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>>>>>>>> well as links in between the nodes.
> > >>>>>>>>>>>>>>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
> > >>>>> upgraded
> > >>>>>>>>>>>> rule
> > >>>>>>>>>>>>>>>>>>> engine
> > >>>>>>>>>>>>>>>>>>>>>> API.
> > >>>>>>>>>>>>>>>>>>>>>>>>> 4. Multi-query optimization, used to find the
> > >>>> best
> > >>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>> plan
> > >>>>>>>>>>>>>>>>>>>>>>>>> involving multiple time points.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Note that this feature is an extension in
> > >>>> nature
> > >>>>> and
> > >>>>>>>> thus
> > >>>>>>>>>>>> when
> > >>>>>>>>>>>>>>>>>>>>>> disabled,
> > >>>>>>>>>>>>>>>>>>>>>>>>> does not change any existing Calcite
> > >> behavior.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Other than scenarios in the paper, we also
> > >>>> applied
> > >>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>> Calcite-extended
> > >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer to a type of
> > >>>> periodic
> > >>>>>>> query
> > >>>>>>>>>>>> called
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> ‘‘range
> > >>>>>>>>>>>>>>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
> > >>>> achieved
> > >>>>> cost
> > >>>>>>>>>>>> savings
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>> 80%
> > >>>>>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> > >>>>>>> end-to-end
> > >>>>>>>>>>>>>>> execution
> > >>>>>>>>>>>>>>>>>>>>>> time.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> All comments and suggestions are welcome.
> > >>>> Thanks
> > >>>>> and
> > >>>>>>>> happy
> > >>>>>>>>>>>>>>>>>>> holidays!
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~
> > >>>>>>>>>>>>>>>>>>> no mistakes
> > >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Viliam Durina
> > >>>>>>> Jet Developer
> > >>>>>>>      hazelcast®
> > >>>>>>>
> > >>>>>>>  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo,
> > >> CA
> > >>>>> 94402 |
> > >>>>>>> USA
> > >>>>>>> +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
> > >> https://www.hazelcast.com>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> This message contains confidential information and is intended
> > >> only
> > >>>> for
> > >>>>>>> the
> > >>>>>>> individuals named. If you are not the named addressee you should
> > >> not
> > >>>>>>> disseminate, distribute or copy this e-mail. Please notify the
> > >>>> sender
> > >>>>>>> immediately by e-mail if you have received this e-mail by mistake
> > >>>> and
> > >>>>>>> delete this e-mail from your system. E-mail transmission cannot be
> > >>>>>>> guaranteed to be secure or error-free as information could be
> > >>>>> intercepted,
> > >>>>>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
> > >>>>> viruses.
> > >>>>>>> The sender therefore does not accept liability for any errors or
> > >>>>> omissions
> > >>>>>>> in the contents of this message, which arise as a result of e-mail
> > >>>>>>> transmission. If verification is required, please request a
> > >>>> hard-copy
> > >>>>>>> version. -Hazelcast
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
> 

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi all,

Thank you all for the interest, and thanks Julian for the update!

I am having problems uploading the pdf files into the jira CALCITE-4568
<https://issues.apache.org/jira/browse/CALCITE-4568>, so I attached the
slides in our code base:
https://github.com/alibaba/cost-based-incremental-optimizer/blob/main/Tempura_Calcite_presentation.pdf

The slides contain a walking example of how Tempura expands its memo. The
current version of the code also has two e2e unit tests at
TvrOptimizationTest.java and TvrExecutionTest.java. Please feel free to
start playing with them, and feel free to reach out and possibly schedule
another meeting if needed.

As agreed in the meeting, we will rebase our code to a newer version of
Calcite.

Best,
Botong

On Thu, May 13, 2021 at 12:47 PM Julian Hyde <jh...@gmail.com> wrote:

> During the meeting we agreed to start progressing this contribution in the
> usual Apache Way, with conversations on the dev list and in the
> https://issues.apache.org/jira/browse/CALCITE-4568 <
> https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
> should be easy for you to participate.
>
> Botong said he would share the slides. (He might be unwilling to make them
> public, because they are his presentation for a conference that has not
> happened yet. Reach out to him one-to-one.)
>
> Next step is for someone on the Alibaba side to create a PR that is
> rebased on the latest Calcite master, and add a comment to the JIRA case.
> Then we can discuss what needs to be done for that PR. Code quality, adding
> comments, breaking up into smaller commits, additional tests, renaming
> packages/classes, restructuring into plugins are all possibilities.
>
> Our side of the bargain, as committers, is that we should review in a
> timely manner, and not move the goal posts — if the contributors make the
> changes we request then we will land this code in master in a reasonable
> amount of time.
>
> We also discussed incremental view maintenance (IVM). Tempura solves a
> more general problem (finding the optimal K steps to maintain a
> materialized view as data arrives in K points in time) but if we set K=2,
> we can generate a plan for how to update a materialized view given a delta
> table. The plan will be different based on cost - e.g. whether the delta
> table is small or large. This is a problem that many of our users would
> like to solve. It will exercise much of Tempura’s code base, and encourage
> contributions.
>
> In my opinion, we should do IVM at launch. It should be the main example
> we use in conference talks, blog posts, etc. When people understand that
> case, we can explain how we generalize from K=2 to arbitrary K.
>
> Julian
>
>
> > On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
> >
> > I apologize that I had a wrong impression on the meeting time (I thought
> it
> > should be on Thursday but it is Wednesday). I can follow up your meeting
> > records if you have any.
> >
> >
> > -Rui
> >
> > On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> This is a reminder that we are going to have our second discussion
> meeting
> >> tomorrow at 10-11pm PST. Please find the link below, everyone is
> welcome to
> >> join!
> >>
> >> Join Zoom Meeting
> >> https://uci.zoom.us/j/91986206610
> >> <
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
> >>>
> >>
> >> Meeting ID: 919 8620 6610
> >> One tap mobile
> >> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
> >> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
> >>
> >> Dial by your location
> >>        +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> >>        +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> >>        +1 346 248 7799 <(346)%20248-7799> US (Houston)
> >>        +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> >>        +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> >>        +1 646 558 8656 <(646)%20558-8656> US (New York)
> >> Meeting ID: 919 8620 6610
> >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> >> <
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
> >>>
> >>
> >> Join by Skype for Business
> >> https://uci.zoom.us/skype/91986206610
> >> <
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> >>>
> >>
> >> Thanks,
> >> Botong
> >>
> >> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
> >>
> >>> Hi Stamatis and all,
> >>>
> >>> Thanks for the interest! Let's tentatively schedule the next meeting
> next
> >>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's
> >> new
> >>> needs showing up.
> >>>
> >>> Best,
> >>> Botong
> >>>
> >>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I really regret missing the first meeting, sorry about that. I added
> my
> >>>> preferences in the document.
> >>>> I will make sure to attend the next one and help as much as I can.
> >>>>
> >>>> I didn't have the chance yet to go over the paper but will try to do
> it
> >>>> before the next meeting.
> >>>>
> >>>> For me the following dates are more convenient than others so it would
> >> be
> >>>> nice if we could arrange it then.
> >>>>
> >>>> Thu, May 6, 10pm PST
> >>>> Tue, May 12, 10pm PST
> >>>>
> >>>> Best,
> >>>> Stamatis
> >>>>
> >>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>> I have added my time preferences to the doc [1]. I am generally
> >>>>> available any evening Mon - Thu. How about we meet Monday 10th May?
> >>>>>
> >>>>> Stamatis, Jesus, Given the complexity of this work, I would very much
> >>>>> appreciate your insight, as experts in optimizer theory. Could one of
> >>>>> you join the next meeting? Of course we should choose a time that
> >>>>> works for everyone's schedule.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>
> >>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> We didn't record it, we will try to record the following meetings.
> >>>> Please
> >>>>>> add your time preference in the docs, so that we can find a meeting
> >>>> time
> >>>>>> that works for more people.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Botong
> >>>>>>
> >>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
> >> viliam@hazelcast.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Is there a recording available?
> >>>>>>> Viliam
> >>>>>>>
> >>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> The meeting yesterday was fun and productive. As discussed, this
> >>>> is
> >>>>> the
> >>>>>>>> call to schedule our second meeting.
> >>>>>>>>
> >>>>>>>> We encourage everyone to add their time preferences during
> >> 05/01 -
> >>>>> 05/15
> >>>>>>>> here:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Botong
> >>>>>>>>
> >>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>> We've created a zoom meeting below for our meeting next Monday
> >>>>>>>>> (9pm-10:30pm PST on 04/26).
> >>>>>>>>> Talk to you all soon!
> >>>>>>>>>
> >>>>>>>>> Join Zoom Meeting
> >>>>>>>>> https://uci.zoom.us/j/91279732686
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Meeting ID: 912 7973 2686
> >>>>>>>>> One tap mobile
> >>>>>>>>> +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
> >>>>>>>>> +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
> >>>>>>>>>
> >>>>>>>>> Dial by your location
> >>>>>>>>> +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> >>>>>>>>> +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> >>>>>>>>> +1 346 248 7799 <(346)%20248-7799> US (Houston)
> >>>>>>>>> +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> >>>>>>>>> +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> >>>>>>>>> +1 646 558 8656 <(646)%20558-8656> US (New York)
> >>>>>>>>> Meeting ID: 912 7973 2686
> >>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Join by Skype for Business
> >>>>>>>>> https://uci.zoom.us/skype/91279732686
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Botong
> >>>>>>>>>
> >>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
> >> pkuhbt@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> According to the preferences collected, we are tentatively
> >>>>> scheduling
> >>>>>>>> our
> >>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
> >>>>>>>>>>
> >>>>>>>>>> We will give a presentation about Tempura, followed by a free
> >>>>>>>> discussion.
> >>>>>>>>>>
> >>>>>>>>>> Please let us know if there are new other requests. Few days
> >>>>> before
> >>>>>>>>>> the meeting, I will send out a zoom meeting link.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Botong
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
> >> pkuhbt@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Julian and all,
> >>>>>>>>>>>
> >>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
> >> a
> >>>>> quick
> >>>>>>>> peek
> >>>>>>>>>>> at the last five commits.
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> >>>>>>>>>>>
> >>>>>>>>>>> I've also opened a Jira (CALCITE-4568
> >>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
> >> which
> >>>>> will
> >>>>>>>> serve
> >>>>>>>>>>> as the umbrella Jira for the feature.
> >>>>>>>>>>>
> >>>>>>>>>>> In the meantime, we encourage everyone to enter the time
> >>>>> preferences
> >>>>>>>> for
> >>>>>>>>>>> our first meeting here:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Botong
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> >>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I have added my time preferences to the doc.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Initial discussions will need to be about architecture and
> >>>>>>> high-level
> >>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
> >> PR
> >>>>>>>> line-by-line
> >>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
> >>>>> design
> >>>>>>>>>>>> holistically, and prepare questions/comments before the
> >>>> meeting.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
> >>>> task?
> >>>>>>> JIRA
> >>>>>>>>>>>> how we track long-running tasks such as this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
> >> pkuhbt@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
> >> our
> >>>>> code
> >>>>>>>> base
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
> >>>> peek.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
> >>>> for a
> >>>>> Zoom
> >>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
> >> Calcite.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Since some of our team members are in China, we prefer
> >> the
> >>>>> time
> >>>>>>> slot
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
> >> preference
> >>>> in
> >>>>> the
> >>>>>>>>>>>> shared
> >>>>>>>>>>>>> doc below.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We encourage everyone to add their time preferences
> >> (during
> >>>>>>>>>>>> 04/15-04/30) in
> >>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
> >>>> that
> >>>>> works
> >>>>>>>> for
> >>>>>>>>>>>>> most.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> >>>>> pkuhbt@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Julian and Rui,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
> >>>> some
> >>>>>>> slides
> >>>>>>>>>>>> for the
> >>>>>>>>>>>>>> meeting.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
> >> free
> >>>> to
> >>>>> add
> >>>>>>>>>>>> more in
> >>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> >>>>>>>> jhyde.apache@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
> >>>>> idea. I
> >>>>>>>>>>>> think we
> >>>>>>>>>>>>>>> should create it to continue discussion after the first
> >>>>> meeting.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> >>>>>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
> >>>> The
> >>>>> PR
> >>>>>>> will
> >>>>>>>>>>>> allow
> >>>>>>>>>>>>>>> us to read the code, but I think we should do the first
> >>>>> round of
> >>>>>>>>>>>> questions
> >>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
> >>>>>>>>>>>> presentation of the
> >>>>>>>>>>>>>>> paper (do you have some slides you are planning to
> >>>> present
> >>>>> at
> >>>>>>>> VLDB,
> >>>>>>>>>>>>>>> Botong?) and then move on to questions about the
> >>>> concepts,
> >>>>> which
> >>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
> >>>> onto
> >>>>>>> other
> >>>>>>>>>>>> current
> >>>>>>>>>>>>>>> and future concepts in calcite.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
> >>>>> line-by-line
> >>>>>>> at
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>> point. We need to understand the high-level concepts
> >> and
> >>>>> design
> >>>>>>>>>>>> choices. If
> >>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
> >>>> details.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
> >> doubt
> >>>>> that we
> >>>>>>>>>>>> will be
> >>>>>>>>>>>>>>> able to integrate everything, but we can build
> >>>> understanding
> >>>>>>> about
> >>>>>>>>>>>> where
> >>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
> >>>> of
> >>>>> code
> >>>>>>> to
> >>>>>>>>>>>> help us
> >>>>>>>>>>>>>>> get there.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As I said before, after the integration I would like
> >>>>> people to
> >>>>>>> be
> >>>>>>>>>>>> able
> >>>>>>>>>>>>>>> to experiment with it and use it in their production
> >>>>> systems.
> >>>>>>>> That
> >>>>>>>>>>>> way, it
> >>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
> >> set
> >>>>>>>>>>>> integrates with
> >>>>>>>>>>>>>>> other calcite features and gets stronger over time.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> >>>>> amaliujia@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For me to participate in the discussion for the
> >> above
> >>>>>>>> questions,
> >>>>>>>>>>>> I
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
> >>>>> likely
> >>>>>>> ask
> >>>>>>>>>>>> lots of
> >>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
> >>>>> questions
> >>>>>>>> and
> >>>>>>>>>>>> back
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> forward discussion.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Rui
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> >>>>>>>> amaliujia@apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
> >>>>> (review
> >>>>>>>> code
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> doc,
> >>>>>>>>>>>>>>>>>> etc.).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> While you can share your code so people can have
> >> more
> >>>>> idea
> >>>>>>> how
> >>>>>>>>>>>> it is
> >>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
> >>>> doc
> >>>>> to
> >>>>>>>>>>>> discuss
> >>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
> >>>> here:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
> >>>>> solutions in
> >>>>>>>>>>>> Calcite
> >>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
> >>>> multi-query
> >>>>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
> >>>> Spool
> >>>>>>>>>>>> operator),
> >>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
> >>>> models
> >>>>> -
> >>>>>>> one
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>> “view
> >>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
> >>>> the
> >>>>>>>>>>>> objectives of
> >>>>>>>>>>>>>>> each
> >>>>>>>>>>>>>>>>>> activity are so different?
> >>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
> >>>>>>> multi-objective
> >>>>>>>>>>>>>>> parametric
> >>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
> >>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [1]:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -Rui
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> >>>>> zinking3@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> >>>>>>>>>>>> pkuhbt@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Julian,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
> >>>>> wondering
> >>>>>>> if
> >>>>>>>> it
> >>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>> help
> >>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> >>>>>>>> pkuhbt@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Julian,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
> >>>> plan
> >>>>>>> that
> >>>>>>>>>>>> best
> >>>>>>>>>>>>>>>>>>> benefits
> >>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
> >>>>> hopefully
> >>>>>>>>>>>> answer
> >>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>>>> questions.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
> >>>> time
> >>>>>>> points
> >>>>>>>> to
> >>>>>>>>>>>>>>>>>>> consider
> >>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
> >>>>>>> preference
> >>>>>>>>>>>> over
> >>>>>>>>>>>>>>>>>>> time,
> >>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
> >>>> that
> >>>>>>>>>>>> minimizes the
> >>>>>>>>>>>>>>>>>>>> overall
> >>>>>>>>>>>>>>>>>>>>> cost function.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
> >>>> different
> >>>>> time
> >>>>>>>>>>>> points
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
> >> identical
> >>>>> plans
> >>>>>>> in
> >>>>>>>>>>>> all
> >>>>>>>>>>>>>>> delta
> >>>>>>>>>>>>>>>>>>>> runs
> >>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
> >> the
> >>>>>>> Tempura
> >>>>>>>>>>>> paper,
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
> >>>>> specifying
> >>>>>>> two
> >>>>>>>>>>>>>>> (logical)
> >>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
> >> and
> >>>>> later
> >>>>>>>>>>>> delta
> >>>>>>>>>>>>>>> runs
> >>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
> >>>> supports
> >>>>>>> various
> >>>>>>>>>>>> form
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
> >>>>>>> append-only
> >>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>> model in
> >>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
> >>>>> subsumes
> >>>>>>>> the
> >>>>>>>>>>>>>>> current
> >>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
> >>>> implementations.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
> >>>>> seperate
> >>>>>>>> cost
> >>>>>>>>>>>>>>> model,
> >>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
> >>>>>>> multi-objective
> >>>>>>>>>>>>>>>>>>>> optimization,
> >>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
> >>>> considered
> >>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
> >>>> that
> >>>>>>>>>>>> converts this
> >>>>>>>>>>>>>>>>>>> cost
> >>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
> >>>> any
> >>>>> two
> >>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>> plans
> >>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
> >>>> optimum.
> >>>>> I
> >>>>>>>> guess
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> go
> >>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
> >> query
> >>>>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>>>> instead
> >>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>> there is a need.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
> >>>>> optimization,
> >>>>>>>>>>>> since our
> >>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
> >>>> materializing
> >>>>>>>>>>>> intermediate
> >>>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
> >>>> problem of
> >>>>>>>>>>>> choosing
> >>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
> >> and
> >>>>>>> reusing
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
> >> plans.
> >>>> We
> >>>>>>>>>>>> borrowed the
> >>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
> >>>>> problem
> >>>>>>>> even
> >>>>>>>>>>>>>>> though
> >>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
> >>>> think
> >>>>> our
> >>>>>>>> work
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> orthogonal
> >>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
> >>>>> views,
> >>>>>>>>>>>> lattice
> >>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
> >>>> can
> >>>>> be
> >>>>>>>>>>>> adopted to
> >>>>>>>>>>>>>>>>>>> wider
> >>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
> >>>>>>> community.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
> >>>> java
> >>>>> code,
> >>>>>>>> it
> >>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> >>>>>>>>>>>>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Botong,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
> >>>>> research,
> >>>>>>>> and
> >>>>>>>>>>>> thank
> >>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
> >>>>> streaming,
> >>>>>>>>>>>>>>>>>>> materialized
> >>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
> >>>> As we
> >>>>>>> have
> >>>>>>>>>>>> already
> >>>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
> >>>> relational
> >>>>>>>>>>>> operators,
> >>>>>>>>>>>>>>>>>>> lattice,
> >>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
> >> see
> >>>>>>> whether
> >>>>>>>>>>>> we can
> >>>>>>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
> >>>>> others.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
> >>>> your
> >>>>>>>>>>>> relations
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
> >>>> streaming
> >>>>>>>>>>>> queries, the
> >>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
> >>>>> that you
> >>>>>>>>>>>> needed
> >>>>>>>>>>>>>>> two
> >>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
> >> maintenance”
> >>>> and
> >>>>>>>>>>>> another for
> >>>>>>>>>>>>>>>>>>> “user
> >>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
> >>>> are
> >>>>> so
> >>>>>>>>>>>> different?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
> >>>> arrival of
> >>>>>>>>>>>>>>> multi-objective
> >>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
> >>>> and
> >>>>>>> digest
> >>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>> paper.
> >>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
> >>>>> process
> >>>>>>> to
> >>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
> >>>>> community.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
> >> this
> >>>>>>>>>>>> functionality
> >>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
> >>>>> experiment
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
> >>>> setting up
> >>>>>>>> complex
> >>>>>>>>>>>>>>>>>>> databases
> >>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
> >> simple
> >>>>> DDL
> >>>>>>>>>>>> operations
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> >>>>> whether
> >>>>>>> we
> >>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>> devise
> >>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> >>>>>>>> pkuhbt@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
> >>>>> figure,
> >>>>>>>> please
> >>>>>>>>>>>>>>> refer
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> Fig
> >>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> >>>>>>>>>>>> taojiatao@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
> >>>> the
> >>>>> mail,
> >>>>>>>>>>>> may you
> >>>>>>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> JIRA
> >>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
> >> can
> >>>>>>> subscribe
> >>>>>>>>>>>> to the
> >>>>>>>>>>>>>>>>>>>> JIRA?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Regards!
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
> >> 于2020年12月24日周四
> >>>>>>>> 上午3:18写道:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
> >>>> optimizer
> >>>>>>> into
> >>>>>>>> a
> >>>>>>>>>>>>>>> general
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
> >>>> research
> >>>>>>> paper
> >>>>>>>>>>>>>>>>>>> published
> >>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> VLDB
> >>>>>>>>>>>>>>>>>>>>>>>>> 2021:
> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
> >>>> framework
> >>>>> for
> >>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>> processing
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
> >> illustrating
> >>>>> how
> >>>>>>>>>>>> Alibaba’s
> >>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
> >>>>> query
> >>>>>>>>>>>> optimizer
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> alleviate
> >>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
> >>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> >>>>> Resource-Aware
> >>>>>>>>>>>>>>> Incremental
> >>>>>>>>>>>>>>>>>>>>>>>> Computing
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
> >>>> general
> >>>>>>>>>>>> cost-based
> >>>>>>>>>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
> >>>>> multiple
> >>>>>>>>>>>> families
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
> >>>>>>> Streaming,
> >>>>>>>>>>>>>>>>>>> DBToaster,
> >>>>>>>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
> >>>>> generated
> >>>>>>> best
> >>>>>>>>>>>> plan
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
> >>>> each
> >>>>>>>>>>>> individual
> >>>>>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>> alone.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
> >>>> central
> >>>>> to
> >>>>>>>>>>>> database
> >>>>>>>>>>>>>>>>>>> view
> >>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
> >> and
> >>>> are
> >>>>>>> being
> >>>>>>>>>>>>>>> adopted
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> active
> >>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
> >>>> approximate
> >>>>>>> query
> >>>>>>>>>>>>>>>>>>> processing,
> >>>>>>>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
> >> widening
> >>>> the
> >>>>>>>>>>>> spectrum of
> >>>>>>>>>>>>>>>>>>>>>> Calcite,
> >>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
> >> Calcite.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
> >>>>> details.
> >>>>>>>>>>>> Please
> >>>>>>>>>>>>>>>>>>> refer
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
> >>>>> working
> >>>>>>> on a
> >>>>>>>>>>>>>>> journal
> >>>>>>>>>>>>>>>>>>>>>> version
> >>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
> >>>> is
> >>>>> meant
> >>>>>>>> to
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> executed
> >>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
> >> Calcite’s
> >>>>> memo
> >>>>>>> will
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> extended
> >>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
> >>>>>>> generating
> >>>>>>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>>>>>> plans
> >>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
> >>>>>>> different
> >>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>> points.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
> >> that
> >>>>>>> changes
> >>>>>>>>>>>> over
> >>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>>> (Time
> >>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> >>>>>>> introduced
> >>>>>>>>>>>>>>>>>>> TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
> >> to
> >>>>> track
> >>>>>>>>>>>> related
> >>>>>>>>>>>>>>>>>>> RelSets
> >>>>>>>>>>>>>>>>>>>>>> of a
> >>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> >>>>> certain
> >>>>>>>>>>>> time,
> >>>>>>>>>>>>>>>>>>> delta of
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
> >> vertical
> >>>>> line
> >>>>>>> is a
> >>>>>>>>>>>>>>>>>>> TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
> >> R,
> >>>>> etc.).
> >>>>>>>>>>>>>>> Horizontal
> >>>>>>>>>>>>>>>>>>>> lines
> >>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
> >> is a
> >>>>>>> RelSet.
> >>>>>>>>>>>> Users
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>> write
> >>>>>>>>>>>>>>>>>>>>>>>> TVR
> >>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
> >> transformations
> >>>>>>> between
> >>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>> dots.
> >>>>>>>>>>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> >>>> that
> >>>>>>>>>>>> describe how
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> compute
> >>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
> >>>>> TVRs.
> >>>>>>> The
> >>>>>>>>>>>> red
> >>>>>>>>>>>>>>> lines
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>> intra-TVR rules that describe transformations
> >>>>> within a
> >>>>>>>>>>>> TVR. All
> >>>>>>>>>>>>>>>>>>> TVR
> >>>>>>>>>>>>>>>>>>>>>>>> rewrite
> >>>>>>>>>>>>>>>>>>>>>>>>> rules are logical rules. All existing Calcite
> >>>>> rules
> >>>>>>>> still
> >>>>>>>>>>>> work
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>> volcano system without modification.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> All changes in this feature will consist of
> >>>> four
> >>>>>>> parts:
> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
> >>>>> TvrMetaSet
> >>>>>>>> and
> >>>>>>>>>>>>>>>>>>> RelNodes,
> >>>>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>>> well as links in between the nodes.
> >>>>>>>>>>>>>>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
> >>>>> upgraded
> >>>>>>>>>>>> rule
> >>>>>>>>>>>>>>>>>>> engine
> >>>>>>>>>>>>>>>>>>>>>> API.
> >>>>>>>>>>>>>>>>>>>>>>>>> 4. Multi-query optimization, used to find the
> >>>> best
> >>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>> plan
> >>>>>>>>>>>>>>>>>>>>>>>>> involving multiple time points.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Note that this feature is an extension in
> >>>> nature
> >>>>> and
> >>>>>>>> thus
> >>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>>>> disabled,
> >>>>>>>>>>>>>>>>>>>>>>>>> does not change any existing Calcite
> >> behavior.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Other than scenarios in the paper, we also
> >>>> applied
> >>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> Calcite-extended
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer to a type of
> >>>> periodic
> >>>>>>> query
> >>>>>>>>>>>> called
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> ‘‘range
> >>>>>>>>>>>>>>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
> >>>> achieved
> >>>>> cost
> >>>>>>>>>>>> savings
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> 80%
> >>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> >>>>>>> end-to-end
> >>>>>>>>>>>>>>> execution
> >>>>>>>>>>>>>>>>>>>>>> time.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> All comments and suggestions are welcome.
> >>>> Thanks
> >>>>> and
> >>>>>>>> happy
> >>>>>>>>>>>>>>>>>>> holidays!
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~
> >>>>>>>>>>>>>>>>>>> no mistakes
> >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Viliam Durina
> >>>>>>> Jet Developer
> >>>>>>>      hazelcast®
> >>>>>>>
> >>>>>>>  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo,
> >> CA
> >>>>> 94402 |
> >>>>>>> USA
> >>>>>>> +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
> >> https://www.hazelcast.com>
> >>>>>>>
> >>>>>>> --
> >>>>>>> This message contains confidential information and is intended
> >> only
> >>>> for
> >>>>>>> the
> >>>>>>> individuals named. If you are not the named addressee you should
> >> not
> >>>>>>> disseminate, distribute or copy this e-mail. Please notify the
> >>>> sender
> >>>>>>> immediately by e-mail if you have received this e-mail by mistake
> >>>> and
> >>>>>>> delete this e-mail from your system. E-mail transmission cannot be
> >>>>>>> guaranteed to be secure or error-free as information could be
> >>>>> intercepted,
> >>>>>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
> >>>>> viruses.
> >>>>>>> The sender therefore does not accept liability for any errors or
> >>>>> omissions
> >>>>>>> in the contents of this message, which arise as a result of e-mail
> >>>>>>> transmission. If verification is required, please request a
> >>>> hard-copy
> >>>>>>> version. -Hazelcast
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Forward Xu <fo...@gmail.com>.
Thanks for sharing +1


forward

Albert <zi...@gmail.com> 于2021年5月14日周五 下午2:02写道:

> Thanks for sharing
>
>
> 在 2021年5月14日星期五,Julian Hyde <jh...@gmail.com> 写道:
>
> > During the meeting we agreed to start progressing this contribution in
> the
> > usual Apache Way, with conversations on the dev list and in the
> > https://issues.apache.org/jira/browse/CALCITE-4568 <
> > https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
> > should be easy for you to participate.
> >
> > Botong said he would share the slides. (He might be unwilling to make
> them
> > public, because they are his presentation for a conference that has not
> > happened yet. Reach out to him one-to-one.)
> >
> > Next step is for someone on the Alibaba side to create a PR that is
> > rebased on the latest Calcite master, and add a comment to the JIRA case.
> > Then we can discuss what needs to be done for that PR. Code quality,
> adding
> > comments, breaking up into smaller commits, additional tests, renaming
> > packages/classes, restructuring into plugins are all possibilities.
> >
> > Our side of the bargain, as committers, is that we should review in a
> > timely manner, and not move the goal posts — if the contributors make the
> > changes we request then we will land this code in master in a reasonable
> > amount of time.
> >
> > We also discussed incremental view maintenance (IVM). Tempura solves a
> > more general problem (finding the optimal K steps to maintain a
> > materialized view as data arrives in K points in time) but if we set K=2,
> > we can generate a plan for how to update a materialized view given a
> delta
> > table. The plan will be different based on cost - e.g. whether the delta
> > table is small or large. This is a problem that many of our users would
> > like to solve. It will exercise much of Tempura’s code base, and
> encourage
> > contributions.
> >
> > In my opinion, we should do IVM at launch. It should be the main example
> > we use in conference talks, blog posts, etc. When people understand that
> > case, we can explain how we generalize from K=2 to arbitrary K.
> >
> > Julian
> >
> >
> > > On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
> > >
> > > I apologize that I had a wrong impression on the meeting time (I
> thought
> > it
> > > should be on Thursday but it is Wednesday). I can follow up your
> meeting
> > > records if you have any.
> > >
> > >
> > > -Rui
> > >
> > > On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:
> > >
> > >> Hi all,
> > >>
> > >> This is a reminder that we are going to have our second discussion
> > meeting
> > >> tomorrow at 10-11pm PST. Please find the link below, everyone is
> > welcome to
> > >> join!
> > >>
> > >> Join Zoom Meeting
> > >> https://uci.zoom.us/j/91986206610
> > >> <
> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%
> > 2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
> > >>>
> > >>
> > >> Meeting ID: 919 8620 6610
> > >> One tap mobile
> > >> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
> > >> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
> > >>
> > >> Dial by your location
> > >>        +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> > >>        +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> > >>        +1 346 248 7799 <(346)%20248-7799> US (Houston)
> > >>        +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> > >>        +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> > >>        +1 646 558 8656 <(646)%20558-8656> US (New York)
> > >> Meeting ID: 919 8620 6610
> > >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> > >> <
> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%
> > 2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
> > >>>
> > >>
> > >> Join by Skype for Business
> > >> https://uci.zoom.us/skype/91986206610
> > >> <
> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%
> > 2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=
> > AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> > >>>
> > >>
> > >> Thanks,
> > >> Botong
> > >>
> > >> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
> > >>
> > >>> Hi Stamatis and all,
> > >>>
> > >>> Thanks for the interest! Let's tentatively schedule the next meeting
> > next
> > >>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if
> there's
> > >> new
> > >>> needs showing up.
> > >>>
> > >>> Best,
> > >>> Botong
> > >>>
> > >>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <
> zabetak@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hello,
> > >>>>
> > >>>> I really regret missing the first meeting, sorry about that. I added
> > my
> > >>>> preferences in the document.
> > >>>> I will make sure to attend the next one and help as much as I can.
> > >>>>
> > >>>> I didn't have the chance yet to go over the paper but will try to do
> > it
> > >>>> before the next meeting.
> > >>>>
> > >>>> For me the following dates are more convenient than others so it
> would
> > >> be
> > >>>> nice if we could arrange it then.
> > >>>>
> > >>>> Thu, May 6, 10pm PST
> > >>>> Tue, May 12, 10pm PST
> > >>>>
> > >>>> Best,
> > >>>> Stamatis
> > >>>>
> > >>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org>
> wrote:
> > >>>>
> > >>>>> I have added my time preferences to the doc [1]. I am generally
> > >>>>> available any evening Mon - Thu. How about we meet Monday 10th May?
> > >>>>>
> > >>>>> Stamatis, Jesus, Given the complexity of this work, I would very
> much
> > >>>>> appreciate your insight, as experts in optimizer theory. Could one
> of
> > >>>>> you join the next meeting? Of course we should choose a time that
> > >>>>> works for everyone's schedule.
> > >>>>>
> > >>>>> Julian
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > >>>>
> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> > 7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>
> > >>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
> > >> wrote:
> > >>>>>>
> > >>>>>> We didn't record it, we will try to record the following meetings.
> > >>>> Please
> > >>>>>> add your time preference in the docs, so that we can find a
> meeting
> > >>>> time
> > >>>>>> that works for more people.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Botong
> > >>>>>>
> > >>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
> > >> viliam@hazelcast.com>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> Is there a recording available?
> > >>>>>>> Viliam
> > >>>>>>>
> > >>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> The meeting yesterday was fun and productive. As discussed, this
> > >>>> is
> > >>>>> the
> > >>>>>>>> call to schedule our second meeting.
> > >>>>>>>>
> > >>>>>>>> We encourage everyone to add their time preferences during
> > >> 05/01 -
> > >>>>> 05/15
> > >>>>>>>> here:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> > 7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Botong
> > >>>>>>>>
> > >>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi all,
> > >>>>>>>>> We've created a zoom meeting below for our meeting next Monday
> > >>>>>>>>> (9pm-10:30pm PST on 04/26).
> > >>>>>>>>> Talk to you all soon!
> > >>>>>>>>>
> > >>>>>>>>> Join Zoom Meeting
> > >>>>>>>>> https://uci.zoom.us/j/91279732686
> > >>>>>>>>> <
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%
> > 2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Meeting ID: 912 7973 2686
> > >>>>>>>>> One tap mobile
> > >>>>>>>>> +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
> > >>>>>>>>> +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
> > >>>>>>>>>
> > >>>>>>>>> Dial by your location
> > >>>>>>>>> +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> > >>>>>>>>> +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> > >>>>>>>>> +1 346 248 7799 <(346)%20248-7799> US (Houston)
> > >>>>>>>>> +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> > >>>>>>>>> +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> > >>>>>>>>> +1 646 558 8656 <(646)%20558-8656> US (New York)
> > >>>>>>>>> Meeting ID: 912 7973 2686
> > >>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > >>>>>>>>> <
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%
> > 2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Join by Skype for Business
> > >>>>>>>>> https://uci.zoom.us/skype/91279732686
> > >>>>>>>>> <
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%
> > 2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=
> > AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Botong
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
> > >> pkuhbt@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi all,
> > >>>>>>>>>>
> > >>>>>>>>>> According to the preferences collected, we are tentatively
> > >>>>> scheduling
> > >>>>>>>> our
> > >>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > >>>>>>>>>>
> > >>>>>>>>>> We will give a presentation about Tempura, followed by a free
> > >>>>>>>> discussion.
> > >>>>>>>>>>
> > >>>>>>>>>> Please let us know if there are new other requests. Few days
> > >>>>> before
> > >>>>>>>>>> the meeting, I will send out a zoom meeting link.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> Botong
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
> > >> pkuhbt@gmail.com>
> > >>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Julian and all,
> > >>>>>>>>>>>
> > >>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
> > >> a
> > >>>>> quick
> > >>>>>>>> peek
> > >>>>>>>>>>> at the last five commits.
> > >>>>>>>>>>>
> > >>>>>>>>
> > >>>>>
> > >>>>
> > >> https://github.com/alibaba/cost-based-incremental-
> > optimizer/commits/main
> > >>>>>>>>>>>
> > >>>>>>>>>>> I've also opened a Jira (CALCITE-4568
> > >>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
> > >> which
> > >>>>> will
> > >>>>>>>> serve
> > >>>>>>>>>>> as the umbrella Jira for the feature.
> > >>>>>>>>>>>
> > >>>>>>>>>>> In the meantime, we encourage everyone to enter the time
> > >>>>> preferences
> > >>>>>>>> for
> > >>>>>>>>>>> our first meeting here:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> > 7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>> Botong
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> > >>>>> jhyde.apache@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I have added my time preferences to the doc.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Initial discussions will need to be about architecture and
> > >>>>>>> high-level
> > >>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
> > >> PR
> > >>>>>>>> line-by-line
> > >>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
> > >>>>> design
> > >>>>>>>>>>>> holistically, and prepare questions/comments before the
> > >>>> meeting.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
> > >>>> task?
> > >>>>>>> JIRA
> > >>>>>>>>>>>> how we track long-running tasks such as this.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Julian
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
> > >> pkuhbt@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
> > >> our
> > >>>>> code
> > >>>>>>>> base
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
> > >>>> peek.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
> > >>>> for a
> > >>>>> Zoom
> > >>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
> > >> Calcite.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Since some of our team members are in China, we prefer
> > >> the
> > >>>>> time
> > >>>>>>> slot
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
> > >> preference
> > >>>> in
> > >>>>> the
> > >>>>>>>>>>>> shared
> > >>>>>>>>>>>>> doc below.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> > 7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> We encourage everyone to add their time preferences
> > >> (during
> > >>>>>>>>>>>> 04/15-04/30) in
> > >>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
> > >>>> that
> > >>>>> works
> > >>>>>>>> for
> > >>>>>>>>>>>>> most.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> > >>>>> pkuhbt@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi Julian and Rui,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
> > >>>> some
> > >>>>>>> slides
> > >>>>>>>>>>>> for the
> > >>>>>>>>>>>>>> meeting.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
> > >> free
> > >>>> to
> > >>>>> add
> > >>>>>>>>>>>> more in
> > >>>>>>>>>>>>>> here:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> > 7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > >>>>>>>> jhyde.apache@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
> > >>>>> idea. I
> > >>>>>>>>>>>> think we
> > >>>>>>>>>>>>>>> should create it to continue discussion after the first
> > >>>>> meeting.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Julian
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > >>>>>>>> jhyde.apache@gmail.com>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
> > >>>> The
> > >>>>> PR
> > >>>>>>> will
> > >>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>> us to read the code, but I think we should do the first
> > >>>>> round of
> > >>>>>>>>>>>> questions
> > >>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
> > >>>>>>>>>>>> presentation of the
> > >>>>>>>>>>>>>>> paper (do you have some slides you are planning to
> > >>>> present
> > >>>>> at
> > >>>>>>>> VLDB,
> > >>>>>>>>>>>>>>> Botong?) and then move on to questions about the
> > >>>> concepts,
> > >>>>> which
> > >>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
> > >>>> onto
> > >>>>>>> other
> > >>>>>>>>>>>> current
> > >>>>>>>>>>>>>>> and future concepts in calcite.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
> > >>>>> line-by-line
> > >>>>>>> at
> > >>>>>>>>>>>> this
> > >>>>>>>>>>>>>>> point. We need to understand the high-level concepts
> > >> and
> > >>>>> design
> > >>>>>>>>>>>> choices. If
> > >>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
> > >>>> details.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
> > >> doubt
> > >>>>> that we
> > >>>>>>>>>>>> will be
> > >>>>>>>>>>>>>>> able to integrate everything, but we can build
> > >>>> understanding
> > >>>>>>> about
> > >>>>>>>>>>>> where
> > >>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
> > >>>> of
> > >>>>> code
> > >>>>>>> to
> > >>>>>>>>>>>> help us
> > >>>>>>>>>>>>>>> get there.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> As I said before, after the integration I would like
> > >>>>> people to
> > >>>>>>> be
> > >>>>>>>>>>>> able
> > >>>>>>>>>>>>>>> to experiment with it and use it in their production
> > >>>>> systems.
> > >>>>>>>> That
> > >>>>>>>>>>>> way, it
> > >>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
> > >> set
> > >>>>>>>>>>>> integrates with
> > >>>>>>>>>>>>>>> other calcite features and gets stronger over time.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Julian
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> > >>>>> amaliujia@apache.org>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> For me to participate in the discussion for the
> > >> above
> > >>>>>>>> questions,
> > >>>>>>>>>>>> I
> > >>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
> > >>>>> likely
> > >>>>>>> ask
> > >>>>>>>>>>>> lots of
> > >>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
> > >>>>> questions
> > >>>>>>>> and
> > >>>>>>>>>>>> back
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> forward discussion.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> -Rui
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > >>>>>>>> amaliujia@apache.org
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
> > >>>>> (review
> > >>>>>>>> code
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>>> doc,
> > >>>>>>>>>>>>>>>>>> etc.).
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> While you can share your code so people can have
> > >> more
> > >>>>> idea
> > >>>>>>> how
> > >>>>>>>>>>>> it is
> > >>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
> > >>>> doc
> > >>>>> to
> > >>>>>>>>>>>> discuss
> > >>>>>>>>>>>>>>> open
> > >>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
> > >>>> here:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
> > >>>>> solutions in
> > >>>>>>>>>>>> Calcite
> > >>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
> > >>>> multi-query
> > >>>>>>>>>>>> optimization
> > >>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
> > >>>> Spool
> > >>>>>>>>>>>> operator),
> > >>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
> > >>>> models
> > >>>>> -
> > >>>>>>> one
> > >>>>>>>>>>>> for
> > >>>>>>>>>>>>>>> “view
> > >>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
> > >>>> the
> > >>>>>>>>>>>> objectives of
> > >>>>>>>>>>>>>>> each
> > >>>>>>>>>>>>>>>>>> activity are so different?
> > >>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
> > >>>>>>> multi-objective
> > >>>>>>>>>>>>>>> parametric
> > >>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
> > >>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> [1]:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://cacm.acm.org/magazines/2017/10/221322-
> > multi-objective-parametric-query-optimization/fulltext
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> -Rui
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> > >>>>> zinking3@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > >>>>>>>>>>>> pkuhbt@gmail.com>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Hi Julian,
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
> > >>>>> wondering
> > >>>>>>> if
> > >>>>>>>> it
> > >>>>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>>>>> help
> > >>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > >>>>>>>> pkuhbt@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Hi Julian,
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
> > >>>> plan
> > >>>>>>> that
> > >>>>>>>>>>>> best
> > >>>>>>>>>>>>>>>>>>> benefits
> > >>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
> > >>>>> hopefully
> > >>>>>>>>>>>> answer
> > >>>>>>>>>>>>>>> your
> > >>>>>>>>>>>>>>>>>>>>> questions.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
> > >>>> time
> > >>>>>>> points
> > >>>>>>>> to
> > >>>>>>>>>>>>>>>>>>> consider
> > >>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
> > >>>>>>> preference
> > >>>>>>>>>>>> over
> > >>>>>>>>>>>>>>>>>>> time,
> > >>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
> > >>>> that
> > >>>>>>>>>>>> minimizes the
> > >>>>>>>>>>>>>>>>>>>> overall
> > >>>>>>>>>>>>>>>>>>>>> cost function.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
> > >>>> different
> > >>>>> time
> > >>>>>>>>>>>> points
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
> > >> identical
> > >>>>> plans
> > >>>>>>> in
> > >>>>>>>>>>>> all
> > >>>>>>>>>>>>>>> delta
> > >>>>>>>>>>>>>>>>>>>> runs
> > >>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
> > >> the
> > >>>>>>> Tempura
> > >>>>>>>>>>>> paper,
> > >>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
> > >>>>> specifying
> > >>>>>>> two
> > >>>>>>>>>>>>>>> (logical)
> > >>>>>>>>>>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
> > >> and
> > >>>>> later
> > >>>>>>>>>>>> delta
> > >>>>>>>>>>>>>>> runs
> > >>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
> > >>>> supports
> > >>>>>>> various
> > >>>>>>>>>>>> form
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
> > >>>>>>> append-only
> > >>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>> model in
> > >>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
> > >>>>> subsumes
> > >>>>>>>> the
> > >>>>>>>>>>>>>>> current
> > >>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
> > >>>> implementations.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
> > >>>>> seperate
> > >>>>>>>> cost
> > >>>>>>>>>>>>>>> model,
> > >>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
> > >>>>>>> multi-objective
> > >>>>>>>>>>>>>>>>>>>> optimization,
> > >>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
> > >>>> considered
> > >>>>>>>>>>>> different
> > >>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
> > >>>> that
> > >>>>>>>>>>>> converts this
> > >>>>>>>>>>>>>>>>>>> cost
> > >>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
> > >>>> any
> > >>>>> two
> > >>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>> plans
> > >>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
> > >>>> optimum.
> > >>>>> I
> > >>>>>>>> guess
> > >>>>>>>>>>>> we
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>> go
> > >>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
> > >> query
> > >>>>>>>>>>>> optimization
> > >>>>>>>>>>>>>>>>>>> instead
> > >>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>> there is a need.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
> > >>>>> optimization,
> > >>>>>>>>>>>> since our
> > >>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
> > >>>> materializing
> > >>>>>>>>>>>> intermediate
> > >>>>>>>>>>>>>>>>>>>> results
> > >>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
> > >>>> problem of
> > >>>>>>>>>>>> choosing
> > >>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
> > >> and
> > >>>>>>> reusing
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
> > >> plans.
> > >>>> We
> > >>>>>>>>>>>> borrowed the
> > >>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
> > >>>>> problem
> > >>>>>>>> even
> > >>>>>>>>>>>>>>> though
> > >>>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
> > >>>> think
> > >>>>> our
> > >>>>>>>> work
> > >>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>> orthogonal
> > >>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
> > >>>>> views,
> > >>>>>>>>>>>> lattice
> > >>>>>>>>>>>>>>> etc.
> > >>>>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
> > >>>> can
> > >>>>> be
> > >>>>>>>>>>>> adopted to
> > >>>>>>>>>>>>>>>>>>> wider
> > >>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
> > >>>>>>> community.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
> > >>>> java
> > >>>>> code,
> > >>>>>>>> it
> > >>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > >>>>>>>>>>>>>>> jhyde.apache@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Botong,
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
> > >>>>> research,
> > >>>>>>>> and
> > >>>>>>>>>>>> thank
> > >>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
> > >>>>> streaming,
> > >>>>>>>>>>>>>>>>>>> materialized
> > >>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
> > >>>> As we
> > >>>>>>> have
> > >>>>>>>>>>>> already
> > >>>>>>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
> > >>>> relational
> > >>>>>>>>>>>> operators,
> > >>>>>>>>>>>>>>>>>>> lattice,
> > >>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
> > >> see
> > >>>>>>> whether
> > >>>>>>>>>>>> we can
> > >>>>>>>>>>>>>>>>>>> make
> > >>>>>>>>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
> > >>>>> others.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
> > >>>> your
> > >>>>>>>>>>>> relations
> > >>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>> used
> > >>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
> > >>>> streaming
> > >>>>>>>>>>>> queries, the
> > >>>>>>>>>>>>>>>>>>> only
> > >>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
> > >>>>> that you
> > >>>>>>>>>>>> needed
> > >>>>>>>>>>>>>>> two
> > >>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
> > >> maintenance”
> > >>>> and
> > >>>>>>>>>>>> another for
> > >>>>>>>>>>>>>>>>>>> “user
> > >>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
> > >>>> are
> > >>>>> so
> > >>>>>>>>>>>> different?
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
> > >>>> arrival of
> > >>>>>>>>>>>>>>> multi-objective
> > >>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
> > >>>> and
> > >>>>>>> digest
> > >>>>>>>>>>>> your
> > >>>>>>>>>>>>>>>>>>> paper.
> > >>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
> > >>>>> process
> > >>>>>>> to
> > >>>>>>>>>>>> create
> > >>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
> > >>>>> community.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
> > >> this
> > >>>>>>>>>>>> functionality
> > >>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
> > >>>>> experiment
> > >>>>>>>>>>>> with
> > >>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
> > >>>> setting up
> > >>>>>>>> complex
> > >>>>>>>>>>>>>>>>>>> databases
> > >>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
> > >> simple
> > >>>>> DDL
> > >>>>>>>>>>>> operations
> > >>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> > >>>>> whether
> > >>>>>>> we
> > >>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>>> devise
> > >>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Julian
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >> https://cacm.acm.org/magazines/2017/10/221322-
> > multi-objective-parametric-query-optimization/fulltext
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > >>>>>>>> pkuhbt@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
> > >>>>> figure,
> > >>>>>>>> please
> > >>>>>>>>>>>>>>> refer
> > >>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> Fig
> > >>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>>>>>>>> Botong
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > >>>>>>>>>>>> taojiatao@gmail.com>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
> > >>>> the
> > >>>>> mail,
> > >>>>>>>>>>>> may you
> > >>>>>>>>>>>>>>>>>>> open
> > >>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>> JIRA
> > >>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
> > >> can
> > >>>>>>> subscribe
> > >>>>>>>>>>>> to the
> > >>>>>>>>>>>>>>>>>>>> JIRA?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Regards!
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
> > >> 于2020年12月24日周四
> > >>>>>>>> 上午3:18写道:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
> > >>>> optimizer
> > >>>>>>> into
> > >>>>>>>> a
> > >>>>>>>>>>>>>>> general
> > >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
> > >>>> research
> > >>>>>>> paper
> > >>>>>>>>>>>>>>>>>>> published
> > >>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>> VLDB
> > >>>>>>>>>>>>>>>>>>>>>>>>> 2021:
> > >>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
> > >>>> framework
> > >>>>> for
> > >>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>>>>>>>> processing
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
> > >> illustrating
> > >>>>> how
> > >>>>>>>>>>>> Alibaba’s
> > >>>>>>>>>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
> > >>>>> query
> > >>>>>>>>>>>> optimizer
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>> alleviate
> > >>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
> > >>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> > >>>>> Resource-Aware
> > >>>>>>>>>>>>>>> Incremental
> > >>>>>>>>>>>>>>>>>>>>>>>> Computing
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
> > >>>> general
> > >>>>>>>>>>>> cost-based
> > >>>>>>>>>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
> > >>>>> multiple
> > >>>>>>>>>>>> families
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
> > >>>>>>> Streaming,
> > >>>>>>>>>>>>>>>>>>> DBToaster,
> > >>>>>>>>>>>>>>>>>>>>>> etc.
> > >>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
> > >>>>> generated
> > >>>>>>> best
> > >>>>>>>>>>>> plan
> > >>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
> > >>>> each
> > >>>>>>>>>>>> individual
> > >>>>>>>>>>>>>>>>>>> method
> > >>>>>>>>>>>>>>>>>>>>>>>> alone.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
> > >>>> central
> > >>>>> to
> > >>>>>>>>>>>> database
> > >>>>>>>>>>>>>>>>>>> view
> > >>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
> > >> and
> > >>>> are
> > >>>>>>> being
> > >>>>>>>>>>>>>>> adopted
> > >>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>> active
> > >>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
> > >>>> approximate
> > >>>>>>> query
> > >>>>>>>>>>>>>>>>>>> processing,
> > >>>>>>>>>>>>>>>>>>>>>> etc.
> > >>>>>>>>>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
> > >> widening
> > >>>> the
> > >>>>>>>>>>>> spectrum of
> > >>>>>>>>>>>>>>>>>>>>>> Calcite,
> > >>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
> > >> Calcite.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
> > >>>>> details.
> > >>>>>>>>>>>> Please
> > >>>>>>>>>>>>>>>>>>> refer
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
> > >>>>> working
> > >>>>>>> on a
> > >>>>>>>>>>>>>>> journal
> > >>>>>>>>>>>>>>>>>>>>>> version
> > >>>>>>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
> > >>>> is
> > >>>>> meant
> > >>>>>>>> to
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>> executed
> > >>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
> > >> Calcite’s
> > >>>>> memo
> > >>>>>>> will
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>> extended
> > >>>>>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
> > >>>>>>> generating
> > >>>>>>>>>>>>>>>>>>> incremental
> > >>>>>>>>>>>>>>>>>>>>>>>> plans
> > >>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
> > >>>>>>> different
> > >>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>>> points.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
> > >> that
> > >>>>>>> changes
> > >>>>>>>>>>>> over
> > >>>>>>>>>>>>>>> time
> > >>>>>>>>>>>>>>>>>>>>>> (Time
> > >>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> > >>>>>>> introduced
> > >>>>>>>>>>>>>>>>>>> TvrMetaSet
> > >>>>>>>>>>>>>>>>>>>>>> into
> > >>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
> > >> to
> > >>>>> track
> > >>>>>>>>>>>> related
> > >>>>>>>>>>>>>>>>>>> RelSets
> > >>>>>>>>>>>>>>>>>>>>>> of a
> > >>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> > >>>>> certain
> > >>>>>>>>>>>> time,
> > >>>>>>>>>>>>>>>>>>> delta of
> > >>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
> > >> vertical
> > >>>>> line
> > >>>>>>> is a
> > >>>>>>>>>>>>>>>>>>> TvrMetaSet
> > >>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
> > >> R,
> > >>>>> etc.).
> > >>>>>>>>>>>>>>> Horizontal
> > >>>>>>>>>>>>>>>>>>>> lines
> > >>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
> > >> is a
> > >>>>>>> RelSet.
> > >>>>>>>>>>>> Users
> > >>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>>> write
> > >>>>>>>>>>>>>>>>>>>>>>>> TVR
> > >>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
> > >> transformations
> > >>>>>>> between
> > >>>>>>>>>>>> these
> > >>>>>>>>>>>>>>>>>>> dots.
> > >>>>>>>>>>>>>>>>>>>>>> For
> > >>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> > >>>> that
> > >>>>>>>>>>>> describe how
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> compute
> > >>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR f
>
>
>
> --
> ~~~~~~~~~~~~~~~
> no mistakes
> ~~~~~~~~~~~~~~~~~~
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Albert <zi...@gmail.com>.
Thanks for sharing


在 2021年5月14日星期五,Julian Hyde <jh...@gmail.com> 写道:

> During the meeting we agreed to start progressing this contribution in the
> usual Apache Way, with conversations on the dev list and in the
> https://issues.apache.org/jira/browse/CALCITE-4568 <
> https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
> should be easy for you to participate.
>
> Botong said he would share the slides. (He might be unwilling to make them
> public, because they are his presentation for a conference that has not
> happened yet. Reach out to him one-to-one.)
>
> Next step is for someone on the Alibaba side to create a PR that is
> rebased on the latest Calcite master, and add a comment to the JIRA case.
> Then we can discuss what needs to be done for that PR. Code quality, adding
> comments, breaking up into smaller commits, additional tests, renaming
> packages/classes, restructuring into plugins are all possibilities.
>
> Our side of the bargain, as committers, is that we should review in a
> timely manner, and not move the goal posts — if the contributors make the
> changes we request then we will land this code in master in a reasonable
> amount of time.
>
> We also discussed incremental view maintenance (IVM). Tempura solves a
> more general problem (finding the optimal K steps to maintain a
> materialized view as data arrives in K points in time) but if we set K=2,
> we can generate a plan for how to update a materialized view given a delta
> table. The plan will be different based on cost - e.g. whether the delta
> table is small or large. This is a problem that many of our users would
> like to solve. It will exercise much of Tempura’s code base, and encourage
> contributions.
>
> In my opinion, we should do IVM at launch. It should be the main example
> we use in conference talks, blog posts, etc. When people understand that
> case, we can explain how we generalize from K=2 to arbitrary K.
>
> Julian
>
>
> > On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
> >
> > I apologize that I had a wrong impression on the meeting time (I thought
> it
> > should be on Thursday but it is Wednesday). I can follow up your meeting
> > records if you have any.
> >
> >
> > -Rui
> >
> > On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> This is a reminder that we are going to have our second discussion
> meeting
> >> tomorrow at 10-11pm PST. Please find the link below, everyone is
> welcome to
> >> join!
> >>
> >> Join Zoom Meeting
> >> https://uci.zoom.us/j/91986206610
> >> <
> >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%
> 2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
> >>>
> >>
> >> Meeting ID: 919 8620 6610
> >> One tap mobile
> >> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
> >> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
> >>
> >> Dial by your location
> >>        +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> >>        +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> >>        +1 346 248 7799 <(346)%20248-7799> US (Houston)
> >>        +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> >>        +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> >>        +1 646 558 8656 <(646)%20558-8656> US (New York)
> >> Meeting ID: 919 8620 6610
> >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> >> <
> >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%
> 2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
> >>>
> >>
> >> Join by Skype for Business
> >> https://uci.zoom.us/skype/91986206610
> >> <
> >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%
> 2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=
> AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> >>>
> >>
> >> Thanks,
> >> Botong
> >>
> >> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
> >>
> >>> Hi Stamatis and all,
> >>>
> >>> Thanks for the interest! Let's tentatively schedule the next meeting
> next
> >>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's
> >> new
> >>> needs showing up.
> >>>
> >>> Best,
> >>> Botong
> >>>
> >>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I really regret missing the first meeting, sorry about that. I added
> my
> >>>> preferences in the document.
> >>>> I will make sure to attend the next one and help as much as I can.
> >>>>
> >>>> I didn't have the chance yet to go over the paper but will try to do
> it
> >>>> before the next meeting.
> >>>>
> >>>> For me the following dates are more convenient than others so it would
> >> be
> >>>> nice if we could arrange it then.
> >>>>
> >>>> Thu, May 6, 10pm PST
> >>>> Tue, May 12, 10pm PST
> >>>>
> >>>> Best,
> >>>> Stamatis
> >>>>
> >>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>> I have added my time preferences to the doc [1]. I am generally
> >>>>> available any evening Mon - Thu. How about we meet Monday 10th May?
> >>>>>
> >>>>> Stamatis, Jesus, Given the complexity of this work, I would very much
> >>>>> appreciate your insight, as experts in optimizer theory. Could one of
> >>>>> you join the next meeting? Of course we should choose a time that
> >>>>> works for everyone's schedule.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> 7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>
> >>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> We didn't record it, we will try to record the following meetings.
> >>>> Please
> >>>>>> add your time preference in the docs, so that we can find a meeting
> >>>> time
> >>>>>> that works for more people.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Botong
> >>>>>>
> >>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
> >> viliam@hazelcast.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Is there a recording available?
> >>>>>>> Viliam
> >>>>>>>
> >>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> The meeting yesterday was fun and productive. As discussed, this
> >>>> is
> >>>>> the
> >>>>>>>> call to schedule our second meeting.
> >>>>>>>>
> >>>>>>>> We encourage everyone to add their time preferences during
> >> 05/01 -
> >>>>> 05/15
> >>>>>>>> here:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> 7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Botong
> >>>>>>>>
> >>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>> We've created a zoom meeting below for our meeting next Monday
> >>>>>>>>> (9pm-10:30pm PST on 04/26).
> >>>>>>>>> Talk to you all soon!
> >>>>>>>>>
> >>>>>>>>> Join Zoom Meeting
> >>>>>>>>> https://uci.zoom.us/j/91279732686
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%
> 2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Meeting ID: 912 7973 2686
> >>>>>>>>> One tap mobile
> >>>>>>>>> +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
> >>>>>>>>> +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
> >>>>>>>>>
> >>>>>>>>> Dial by your location
> >>>>>>>>> +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> >>>>>>>>> +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> >>>>>>>>> +1 346 248 7799 <(346)%20248-7799> US (Houston)
> >>>>>>>>> +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> >>>>>>>>> +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> >>>>>>>>> +1 646 558 8656 <(646)%20558-8656> US (New York)
> >>>>>>>>> Meeting ID: 912 7973 2686
> >>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%
> 2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Join by Skype for Business
> >>>>>>>>> https://uci.zoom.us/skype/91279732686
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%
> 2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=
> AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Botong
> >>>>>>>>>
> >>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
> >> pkuhbt@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> According to the preferences collected, we are tentatively
> >>>>> scheduling
> >>>>>>>> our
> >>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
> >>>>>>>>>>
> >>>>>>>>>> We will give a presentation about Tempura, followed by a free
> >>>>>>>> discussion.
> >>>>>>>>>>
> >>>>>>>>>> Please let us know if there are new other requests. Few days
> >>>>> before
> >>>>>>>>>> the meeting, I will send out a zoom meeting link.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Botong
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
> >> pkuhbt@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Julian and all,
> >>>>>>>>>>>
> >>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
> >> a
> >>>>> quick
> >>>>>>>> peek
> >>>>>>>>>>> at the last five commits.
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >> https://github.com/alibaba/cost-based-incremental-
> optimizer/commits/main
> >>>>>>>>>>>
> >>>>>>>>>>> I've also opened a Jira (CALCITE-4568
> >>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
> >> which
> >>>>> will
> >>>>>>>> serve
> >>>>>>>>>>> as the umbrella Jira for the feature.
> >>>>>>>>>>>
> >>>>>>>>>>> In the meantime, we encourage everyone to enter the time
> >>>>> preferences
> >>>>>>>> for
> >>>>>>>>>>> our first meeting here:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> 7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Botong
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> >>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I have added my time preferences to the doc.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Initial discussions will need to be about architecture and
> >>>>>>> high-level
> >>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
> >> PR
> >>>>>>>> line-by-line
> >>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
> >>>>> design
> >>>>>>>>>>>> holistically, and prepare questions/comments before the
> >>>> meeting.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
> >>>> task?
> >>>>>>> JIRA
> >>>>>>>>>>>> how we track long-running tasks such as this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
> >> pkuhbt@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
> >> our
> >>>>> code
> >>>>>>>> base
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
> >>>> peek.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
> >>>> for a
> >>>>> Zoom
> >>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
> >> Calcite.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Since some of our team members are in China, we prefer
> >> the
> >>>>> time
> >>>>>>> slot
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
> >> preference
> >>>> in
> >>>>> the
> >>>>>>>>>>>> shared
> >>>>>>>>>>>>> doc below.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> 7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We encourage everyone to add their time preferences
> >> (during
> >>>>>>>>>>>> 04/15-04/30) in
> >>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
> >>>> that
> >>>>> works
> >>>>>>>> for
> >>>>>>>>>>>>> most.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> >>>>> pkuhbt@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Julian and Rui,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
> >>>> some
> >>>>>>> slides
> >>>>>>>>>>>> for the
> >>>>>>>>>>>>>> meeting.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
> >> free
> >>>> to
> >>>>> add
> >>>>>>>>>>>> more in
> >>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-
> 7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> >>>>>>>> jhyde.apache@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
> >>>>> idea. I
> >>>>>>>>>>>> think we
> >>>>>>>>>>>>>>> should create it to continue discussion after the first
> >>>>> meeting.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> >>>>>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
> >>>> The
> >>>>> PR
> >>>>>>> will
> >>>>>>>>>>>> allow
> >>>>>>>>>>>>>>> us to read the code, but I think we should do the first
> >>>>> round of
> >>>>>>>>>>>> questions
> >>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
> >>>>>>>>>>>> presentation of the
> >>>>>>>>>>>>>>> paper (do you have some slides you are planning to
> >>>> present
> >>>>> at
> >>>>>>>> VLDB,
> >>>>>>>>>>>>>>> Botong?) and then move on to questions about the
> >>>> concepts,
> >>>>> which
> >>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
> >>>> onto
> >>>>>>> other
> >>>>>>>>>>>> current
> >>>>>>>>>>>>>>> and future concepts in calcite.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
> >>>>> line-by-line
> >>>>>>> at
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>> point. We need to understand the high-level concepts
> >> and
> >>>>> design
> >>>>>>>>>>>> choices. If
> >>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
> >>>> details.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
> >> doubt
> >>>>> that we
> >>>>>>>>>>>> will be
> >>>>>>>>>>>>>>> able to integrate everything, but we can build
> >>>> understanding
> >>>>>>> about
> >>>>>>>>>>>> where
> >>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
> >>>> of
> >>>>> code
> >>>>>>> to
> >>>>>>>>>>>> help us
> >>>>>>>>>>>>>>> get there.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As I said before, after the integration I would like
> >>>>> people to
> >>>>>>> be
> >>>>>>>>>>>> able
> >>>>>>>>>>>>>>> to experiment with it and use it in their production
> >>>>> systems.
> >>>>>>>> That
> >>>>>>>>>>>> way, it
> >>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
> >> set
> >>>>>>>>>>>> integrates with
> >>>>>>>>>>>>>>> other calcite features and gets stronger over time.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> >>>>> amaliujia@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For me to participate in the discussion for the
> >> above
> >>>>>>>> questions,
> >>>>>>>>>>>> I
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
> >>>>> likely
> >>>>>>> ask
> >>>>>>>>>>>> lots of
> >>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
> >>>>> questions
> >>>>>>>> and
> >>>>>>>>>>>> back
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> forward discussion.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Rui
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> >>>>>>>> amaliujia@apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
> >>>>> (review
> >>>>>>>> code
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> doc,
> >>>>>>>>>>>>>>>>>> etc.).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> While you can share your code so people can have
> >> more
> >>>>> idea
> >>>>>>> how
> >>>>>>>>>>>> it is
> >>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
> >>>> doc
> >>>>> to
> >>>>>>>>>>>> discuss
> >>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
> >>>> here:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
> >>>>> solutions in
> >>>>>>>>>>>> Calcite
> >>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
> >>>> multi-query
> >>>>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
> >>>> Spool
> >>>>>>>>>>>> operator),
> >>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
> >>>> models
> >>>>> -
> >>>>>>> one
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>> “view
> >>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
> >>>> the
> >>>>>>>>>>>> objectives of
> >>>>>>>>>>>>>>> each
> >>>>>>>>>>>>>>>>>> activity are so different?
> >>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
> >>>>>>> multi-objective
> >>>>>>>>>>>>>>> parametric
> >>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
> >>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [1]:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://cacm.acm.org/magazines/2017/10/221322-
> multi-objective-parametric-query-optimization/fulltext
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -Rui
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> >>>>> zinking3@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> >>>>>>>>>>>> pkuhbt@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Julian,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
> >>>>> wondering
> >>>>>>> if
> >>>>>>>> it
> >>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>> help
> >>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> >>>>>>>> pkuhbt@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Julian,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
> >>>> plan
> >>>>>>> that
> >>>>>>>>>>>> best
> >>>>>>>>>>>>>>>>>>> benefits
> >>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
> >>>>> hopefully
> >>>>>>>>>>>> answer
> >>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>>>> questions.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
> >>>> time
> >>>>>>> points
> >>>>>>>> to
> >>>>>>>>>>>>>>>>>>> consider
> >>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
> >>>>>>> preference
> >>>>>>>>>>>> over
> >>>>>>>>>>>>>>>>>>> time,
> >>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
> >>>> that
> >>>>>>>>>>>> minimizes the
> >>>>>>>>>>>>>>>>>>>> overall
> >>>>>>>>>>>>>>>>>>>>> cost function.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
> >>>> different
> >>>>> time
> >>>>>>>>>>>> points
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
> >> identical
> >>>>> plans
> >>>>>>> in
> >>>>>>>>>>>> all
> >>>>>>>>>>>>>>> delta
> >>>>>>>>>>>>>>>>>>>> runs
> >>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
> >> the
> >>>>>>> Tempura
> >>>>>>>>>>>> paper,
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
> >>>>> specifying
> >>>>>>> two
> >>>>>>>>>>>>>>> (logical)
> >>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
> >> and
> >>>>> later
> >>>>>>>>>>>> delta
> >>>>>>>>>>>>>>> runs
> >>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
> >>>> supports
> >>>>>>> various
> >>>>>>>>>>>> form
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
> >>>>>>> append-only
> >>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>> model in
> >>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
> >>>>> subsumes
> >>>>>>>> the
> >>>>>>>>>>>>>>> current
> >>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
> >>>> implementations.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
> >>>>> seperate
> >>>>>>>> cost
> >>>>>>>>>>>>>>> model,
> >>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
> >>>>>>> multi-objective
> >>>>>>>>>>>>>>>>>>>> optimization,
> >>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
> >>>> considered
> >>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
> >>>> that
> >>>>>>>>>>>> converts this
> >>>>>>>>>>>>>>>>>>> cost
> >>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
> >>>> any
> >>>>> two
> >>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>> plans
> >>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
> >>>> optimum.
> >>>>> I
> >>>>>>>> guess
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> go
> >>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
> >> query
> >>>>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>>>> instead
> >>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>> there is a need.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
> >>>>> optimization,
> >>>>>>>>>>>> since our
> >>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
> >>>> materializing
> >>>>>>>>>>>> intermediate
> >>>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
> >>>> problem of
> >>>>>>>>>>>> choosing
> >>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
> >> and
> >>>>>>> reusing
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
> >> plans.
> >>>> We
> >>>>>>>>>>>> borrowed the
> >>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
> >>>>> problem
> >>>>>>>> even
> >>>>>>>>>>>>>>> though
> >>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
> >>>> think
> >>>>> our
> >>>>>>>> work
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> orthogonal
> >>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
> >>>>> views,
> >>>>>>>>>>>> lattice
> >>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
> >>>> can
> >>>>> be
> >>>>>>>>>>>> adopted to
> >>>>>>>>>>>>>>>>>>> wider
> >>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
> >>>>>>> community.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
> >>>> java
> >>>>> code,
> >>>>>>>> it
> >>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> >>>>>>>>>>>>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Botong,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
> >>>>> research,
> >>>>>>>> and
> >>>>>>>>>>>> thank
> >>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
> >>>>> streaming,
> >>>>>>>>>>>>>>>>>>> materialized
> >>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
> >>>> As we
> >>>>>>> have
> >>>>>>>>>>>> already
> >>>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
> >>>> relational
> >>>>>>>>>>>> operators,
> >>>>>>>>>>>>>>>>>>> lattice,
> >>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
> >> see
> >>>>>>> whether
> >>>>>>>>>>>> we can
> >>>>>>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
> >>>>> others.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
> >>>> your
> >>>>>>>>>>>> relations
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
> >>>> streaming
> >>>>>>>>>>>> queries, the
> >>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
> >>>>> that you
> >>>>>>>>>>>> needed
> >>>>>>>>>>>>>>> two
> >>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
> >> maintenance”
> >>>> and
> >>>>>>>>>>>> another for
> >>>>>>>>>>>>>>>>>>> “user
> >>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
> >>>> are
> >>>>> so
> >>>>>>>>>>>> different?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
> >>>> arrival of
> >>>>>>>>>>>>>>> multi-objective
> >>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
> >>>> and
> >>>>>>> digest
> >>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>> paper.
> >>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
> >>>>> process
> >>>>>>> to
> >>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
> >>>>> community.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
> >> this
> >>>>>>>>>>>> functionality
> >>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
> >>>>> experiment
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
> >>>> setting up
> >>>>>>>> complex
> >>>>>>>>>>>>>>>>>>> databases
> >>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
> >> simple
> >>>>> DDL
> >>>>>>>>>>>> operations
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> >>>>> whether
> >>>>>>> we
> >>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>> devise
> >>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >> https://cacm.acm.org/magazines/2017/10/221322-
> multi-objective-parametric-query-optimization/fulltext
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> >>>>>>>> pkuhbt@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
> >>>>> figure,
> >>>>>>>> please
> >>>>>>>>>>>>>>> refer
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> Fig
> >>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> >>>>>>>>>>>> taojiatao@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
> >>>> the
> >>>>> mail,
> >>>>>>>>>>>> may you
> >>>>>>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> JIRA
> >>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
> >> can
> >>>>>>> subscribe
> >>>>>>>>>>>> to the
> >>>>>>>>>>>>>>>>>>>> JIRA?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Regards!
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
> >> 于2020年12月24日周四
> >>>>>>>> 上午3:18写道:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
> >>>> optimizer
> >>>>>>> into
> >>>>>>>> a
> >>>>>>>>>>>>>>> general
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
> >>>> research
> >>>>>>> paper
> >>>>>>>>>>>>>>>>>>> published
> >>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> VLDB
> >>>>>>>>>>>>>>>>>>>>>>>>> 2021:
> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
> >>>> framework
> >>>>> for
> >>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>> processing
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
> >> illustrating
> >>>>> how
> >>>>>>>>>>>> Alibaba’s
> >>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
> >>>>> query
> >>>>>>>>>>>> optimizer
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> alleviate
> >>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
> >>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> >>>>> Resource-Aware
> >>>>>>>>>>>>>>> Incremental
> >>>>>>>>>>>>>>>>>>>>>>>> Computing
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
> >>>> general
> >>>>>>>>>>>> cost-based
> >>>>>>>>>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
> >>>>> multiple
> >>>>>>>>>>>> families
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
> >>>>>>> Streaming,
> >>>>>>>>>>>>>>>>>>> DBToaster,
> >>>>>>>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
> >>>>> generated
> >>>>>>> best
> >>>>>>>>>>>> plan
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
> >>>> each
> >>>>>>>>>>>> individual
> >>>>>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>> alone.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
> >>>> central
> >>>>> to
> >>>>>>>>>>>> database
> >>>>>>>>>>>>>>>>>>> view
> >>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
> >> and
> >>>> are
> >>>>>>> being
> >>>>>>>>>>>>>>> adopted
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> active
> >>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
> >>>> approximate
> >>>>>>> query
> >>>>>>>>>>>>>>>>>>> processing,
> >>>>>>>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
> >> widening
> >>>> the
> >>>>>>>>>>>> spectrum of
> >>>>>>>>>>>>>>>>>>>>>> Calcite,
> >>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
> >> Calcite.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
> >>>>> details.
> >>>>>>>>>>>> Please
> >>>>>>>>>>>>>>>>>>> refer
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
> >>>>> working
> >>>>>>> on a
> >>>>>>>>>>>>>>> journal
> >>>>>>>>>>>>>>>>>>>>>> version
> >>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
> >>>> is
> >>>>> meant
> >>>>>>>> to
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> executed
> >>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
> >> Calcite’s
> >>>>> memo
> >>>>>>> will
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> extended
> >>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
> >>>>>>> generating
> >>>>>>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>>>>>> plans
> >>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
> >>>>>>> different
> >>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>> points.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
> >> that
> >>>>>>> changes
> >>>>>>>>>>>> over
> >>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>>> (Time
> >>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> >>>>>>> introduced
> >>>>>>>>>>>>>>>>>>> TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
> >> to
> >>>>> track
> >>>>>>>>>>>> related
> >>>>>>>>>>>>>>>>>>> RelSets
> >>>>>>>>>>>>>>>>>>>>>> of a
> >>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> >>>>> certain
> >>>>>>>>>>>> time,
> >>>>>>>>>>>>>>>>>>> delta of
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
> >> vertical
> >>>>> line
> >>>>>>> is a
> >>>>>>>>>>>>>>>>>>> TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
> >> R,
> >>>>> etc.).
> >>>>>>>>>>>>>>> Horizontal
> >>>>>>>>>>>>>>>>>>>> lines
> >>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
> >> is a
> >>>>>>> RelSet.
> >>>>>>>>>>>> Users
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>> write
> >>>>>>>>>>>>>>>>>>>>>>>> TVR
> >>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
> >> transformations
> >>>>>>> between
> >>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>> dots.
> >>>>>>>>>>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> >>>> that
> >>>>>>>>>>>> describe how
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> compute
> >>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR f



-- 
~~~~~~~~~~~~~~~
no mistakes
~~~~~~~~~~~~~~~~~~

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Rui Wang <am...@apache.org>.
Thanks Julian for the sharing! Overall sounds reasonable!

>Our side of the bargain, as committers, is that we should review in a
timely manner, and not move the goal posts
Huge +1. I will be very happy to help during this process.

On Thu, May 13, 2021 at 12:47 PM Julian Hyde <jh...@gmail.com> wrote:

> During the meeting we agreed to start progressing this contribution in the
> usual Apache Way, with conversations on the dev list and in the
> https://issues.apache.org/jira/browse/CALCITE-4568 <
> https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
> should be easy for you to participate.
>
> Botong said he would share the slides. (He might be unwilling to make them
> public, because they are his presentation for a conference that has not
> happened yet. Reach out to him one-to-one.)
>
> Next step is for someone on the Alibaba side to create a PR that is
> rebased on the latest Calcite master, and add a comment to the JIRA case.
> Then we can discuss what needs to be done for that PR. Code quality, adding
> comments, breaking up into smaller commits, additional tests, renaming
> packages/classes, restructuring into plugins are all possibilities.
>
Our side of the bargain, as committers, is that we should review in a
> timely manner, and not move the goal posts — if the contributors make the
> changes we request then we will land this code in master in a reasonable
> amount of time.
>
> We also discussed incremental view maintenance (IVM). Tempura solves a
> more general problem (finding the optimal K steps to maintain a
> materialized view as data arrives in K points in time) but if we set K=2,
> we can generate a plan for how to update a materialized view given a delta
> table. The plan will be different based on cost - e.g. whether the delta
> table is small or large. This is a problem that many of our users would
> like to solve. It will exercise much of Tempura’s code base, and encourage
> contributions.
>
> In my opinion, we should do IVM at launch. It should be the main example
> we use in conference talks, blog posts, etc. When people understand that
> case, we can explain how we generalize from K=2 to arbitrary K.
>
> Julian
>
>
> > On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
> >
> > I apologize that I had a wrong impression on the meeting time (I thought
> it
> > should be on Thursday but it is Wednesday). I can follow up your meeting
> > records if you have any.
> >
> >
> > -Rui
> >
> > On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> This is a reminder that we are going to have our second discussion
> meeting
> >> tomorrow at 10-11pm PST. Please find the link below, everyone is
> welcome to
> >> join!
> >>
> >> Join Zoom Meeting
> >> https://uci.zoom.us/j/91986206610
> >> <
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
> >>>
> >>
> >> Meeting ID: 919 8620 6610
> >> One tap mobile
> >> +16699006833 <(669)%20900-6833> <(669)%20900-6833>,,91986206610# US
> (San Jose)
> >> +12532158782 <(253)%20215-8782> <(253)%20215-8782>,,91986206610# US
> (Tacoma)
> >>
> >> Dial by your location
> >>        +1 669 900 6833 <(669)%20900-6833> <(669)%20900-6833> US (San
> Jose)
> >>        +1 253 215 8782 <(253)%20215-8782> <(253)%20215-8782> US
> (Tacoma)
> >>        +1 346 248 7799 <(346)%20248-7799> <(346)%20248-7799> US
> (Houston)
> >>        +1 301 715 8592 <(301)%20715-8592> <(301)%20715-8592> US
> (Washington DC)
> >>        +1 312 626 6799 <(312)%20626-6799> <(312)%20626-6799> US
> (Chicago)
> >>        +1 646 558 8656 <(646)%20558-8656> <(646)%20558-8656> US (New
> York)
> >> Meeting ID: 919 8620 6610
> >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> >> <
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
> >>>
> >>
> >> Join by Skype for Business
> >> https://uci.zoom.us/skype/91986206610
> >> <
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> >>>
> >>
> >> Thanks,
> >> Botong
> >>
> >> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
> >>
> >>> Hi Stamatis and all,
> >>>
> >>> Thanks for the interest! Let's tentatively schedule the next meeting
> next
> >>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's
> >> new
> >>> needs showing up.
> >>>
> >>> Best,
> >>> Botong
> >>>
> >>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I really regret missing the first meeting, sorry about that. I added
> my
> >>>> preferences in the document.
> >>>> I will make sure to attend the next one and help as much as I can.
> >>>>
> >>>> I didn't have the chance yet to go over the paper but will try to do
> it
> >>>> before the next meeting.
> >>>>
> >>>> For me the following dates are more convenient than others so it would
> >> be
> >>>> nice if we could arrange it then.
> >>>>
> >>>> Thu, May 6, 10pm PST
> >>>> Tue, May 12, 10pm PST
> >>>>
> >>>> Best,
> >>>> Stamatis
> >>>>
> >>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>>> I have added my time preferences to the doc [1]. I am generally
> >>>>> available any evening Mon - Thu. How about we meet Monday 10th May?
> >>>>>
> >>>>> Stamatis, Jesus, Given the complexity of this work, I would very much
> >>>>> appreciate your insight, as experts in optimizer theory. Could one of
> >>>>> you join the next meeting? Of course we should choose a time that
> >>>>> works for everyone's schedule.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>
> >>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> We didn't record it, we will try to record the following meetings.
> >>>> Please
> >>>>>> add your time preference in the docs, so that we can find a meeting
> >>>> time
> >>>>>> that works for more people.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Botong
> >>>>>>
> >>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
> >> viliam@hazelcast.com>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Is there a recording available?
> >>>>>>> Viliam
> >>>>>>>
> >>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> The meeting yesterday was fun and productive. As discussed, this
> >>>> is
> >>>>> the
> >>>>>>>> call to schedule our second meeting.
> >>>>>>>>
> >>>>>>>> We encourage everyone to add their time preferences during
> >> 05/01 -
> >>>>> 05/15
> >>>>>>>> here:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Botong
> >>>>>>>>
> >>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>> We've created a zoom meeting below for our meeting next Monday
> >>>>>>>>> (9pm-10:30pm PST on 04/26).
> >>>>>>>>> Talk to you all soon!
> >>>>>>>>>
> >>>>>>>>> Join Zoom Meeting
> >>>>>>>>> https://uci.zoom.us/j/91279732686
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Meeting ID: 912 7973 2686
> >>>>>>>>> One tap mobile
> >>>>>>>>> +16699006833 <(669)%20900-6833>
> <(669)%20900-6833>,,91279732686# US (San Jose)
> >>>>>>>>> +12532158782 <(253)%20215-8782>
> <(253)%20215-8782>,,91279732686# US (Tacoma)
> >>>>>>>>>
> >>>>>>>>> Dial by your location
> >>>>>>>>> +1 669 900 6833 <(669)%20900-6833> <(669)%20900-6833> US (San
> Jose)
> >>>>>>>>> +1 253 215 8782 <(253)%20215-8782> <(253)%20215-8782> US
> (Tacoma)
> >>>>>>>>> +1 346 248 7799 <(346)%20248-7799> <(346)%20248-7799> US
> (Houston)
> >>>>>>>>> +1 301 715 8592 <(301)%20715-8592> <(301)%20715-8592> US
> (Washington DC)
> >>>>>>>>> +1 312 626 6799 <(312)%20626-6799> <(312)%20626-6799> US
> (Chicago)
> >>>>>>>>> +1 646 558 8656 <(646)%20558-8656> <(646)%20558-8656> US (New
> York)
> >>>>>>>>> Meeting ID: 912 7973 2686
> >>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Join by Skype for Business
> >>>>>>>>> https://uci.zoom.us/skype/91279732686
> >>>>>>>>> <
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Botong
> >>>>>>>>>
> >>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
> >> pkuhbt@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> According to the preferences collected, we are tentatively
> >>>>> scheduling
> >>>>>>>> our
> >>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
> >>>>>>>>>>
> >>>>>>>>>> We will give a presentation about Tempura, followed by a free
> >>>>>>>> discussion.
> >>>>>>>>>>
> >>>>>>>>>> Please let us know if there are new other requests. Few days
> >>>>> before
> >>>>>>>>>> the meeting, I will send out a zoom meeting link.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Botong
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
> >> pkuhbt@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Julian and all,
> >>>>>>>>>>>
> >>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
> >> a
> >>>>> quick
> >>>>>>>> peek
> >>>>>>>>>>> at the last five commits.
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> >>>>>>>>>>>
> >>>>>>>>>>> I've also opened a Jira (CALCITE-4568
> >>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
> >> which
> >>>>> will
> >>>>>>>> serve
> >>>>>>>>>>> as the umbrella Jira for the feature.
> >>>>>>>>>>>
> >>>>>>>>>>> In the meantime, we encourage everyone to enter the time
> >>>>> preferences
> >>>>>>>> for
> >>>>>>>>>>> our first meeting here:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Botong
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> >>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I have added my time preferences to the doc.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Initial discussions will need to be about architecture and
> >>>>>>> high-level
> >>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
> >> PR
> >>>>>>>> line-by-line
> >>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
> >>>>> design
> >>>>>>>>>>>> holistically, and prepare questions/comments before the
> >>>> meeting.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
> >>>> task?
> >>>>>>> JIRA
> >>>>>>>>>>>> how we track long-running tasks such as this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
> >> pkuhbt@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
> >> our
> >>>>> code
> >>>>>>>> base
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
> >>>> peek.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
> >>>> for a
> >>>>> Zoom
> >>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
> >> Calcite.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Since some of our team members are in China, we prefer
> >> the
> >>>>> time
> >>>>>>> slot
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
> >> preference
> >>>> in
> >>>>> the
> >>>>>>>>>>>> shared
> >>>>>>>>>>>>> doc below.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We encourage everyone to add their time preferences
> >> (during
> >>>>>>>>>>>> 04/15-04/30) in
> >>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
> >>>> that
> >>>>> works
> >>>>>>>> for
> >>>>>>>>>>>>> most.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> >>>>> pkuhbt@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Julian and Rui,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
> >>>> some
> >>>>>>> slides
> >>>>>>>>>>>> for the
> >>>>>>>>>>>>>> meeting.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
> >> free
> >>>> to
> >>>>> add
> >>>>>>>>>>>> more in
> >>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> >>>>>>>> jhyde.apache@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
> >>>>> idea. I
> >>>>>>>>>>>> think we
> >>>>>>>>>>>>>>> should create it to continue discussion after the first
> >>>>> meeting.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> >>>>>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
> >>>> The
> >>>>> PR
> >>>>>>> will
> >>>>>>>>>>>> allow
> >>>>>>>>>>>>>>> us to read the code, but I think we should do the first
> >>>>> round of
> >>>>>>>>>>>> questions
> >>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
> >>>>>>>>>>>> presentation of the
> >>>>>>>>>>>>>>> paper (do you have some slides you are planning to
> >>>> present
> >>>>> at
> >>>>>>>> VLDB,
> >>>>>>>>>>>>>>> Botong?) and then move on to questions about the
> >>>> concepts,
> >>>>> which
> >>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
> >>>> onto
> >>>>>>> other
> >>>>>>>>>>>> current
> >>>>>>>>>>>>>>> and future concepts in calcite.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
> >>>>> line-by-line
> >>>>>>> at
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>> point. We need to understand the high-level concepts
> >> and
> >>>>> design
> >>>>>>>>>>>> choices. If
> >>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
> >>>> details.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
> >> doubt
> >>>>> that we
> >>>>>>>>>>>> will be
> >>>>>>>>>>>>>>> able to integrate everything, but we can build
> >>>> understanding
> >>>>>>> about
> >>>>>>>>>>>> where
> >>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
> >>>> of
> >>>>> code
> >>>>>>> to
> >>>>>>>>>>>> help us
> >>>>>>>>>>>>>>> get there.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As I said before, after the integration I would like
> >>>>> people to
> >>>>>>> be
> >>>>>>>>>>>> able
> >>>>>>>>>>>>>>> to experiment with it and use it in their production
> >>>>> systems.
> >>>>>>>> That
> >>>>>>>>>>>> way, it
> >>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
> >> set
> >>>>>>>>>>>> integrates with
> >>>>>>>>>>>>>>> other calcite features and gets stronger over time.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> >>>>> amaliujia@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For me to participate in the discussion for the
> >> above
> >>>>>>>> questions,
> >>>>>>>>>>>> I
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
> >>>>> likely
> >>>>>>> ask
> >>>>>>>>>>>> lots of
> >>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
> >>>>> questions
> >>>>>>>> and
> >>>>>>>>>>>> back
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> forward discussion.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Rui
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> >>>>>>>> amaliujia@apache.org
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
> >>>>> (review
> >>>>>>>> code
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> doc,
> >>>>>>>>>>>>>>>>>> etc.).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> While you can share your code so people can have
> >> more
> >>>>> idea
> >>>>>>> how
> >>>>>>>>>>>> it is
> >>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
> >>>> doc
> >>>>> to
> >>>>>>>>>>>> discuss
> >>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
> >>>> here:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
> >>>>> solutions in
> >>>>>>>>>>>> Calcite
> >>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
> >>>> multi-query
> >>>>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
> >>>> Spool
> >>>>>>>>>>>> operator),
> >>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
> >>>> models
> >>>>> -
> >>>>>>> one
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>> “view
> >>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
> >>>> the
> >>>>>>>>>>>> objectives of
> >>>>>>>>>>>>>>> each
> >>>>>>>>>>>>>>>>>> activity are so different?
> >>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
> >>>>>>> multi-objective
> >>>>>>>>>>>>>>> parametric
> >>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
> >>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [1]:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -Rui
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> >>>>> zinking3@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> >>>>>>>>>>>> pkuhbt@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Julian,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
> >>>>> wondering
> >>>>>>> if
> >>>>>>>> it
> >>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>> help
> >>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> >>>>>>>> pkuhbt@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi Julian,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
> >>>> plan
> >>>>>>> that
> >>>>>>>>>>>> best
> >>>>>>>>>>>>>>>>>>> benefits
> >>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
> >>>>> hopefully
> >>>>>>>>>>>> answer
> >>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>>>> questions.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
> >>>> time
> >>>>>>> points
> >>>>>>>> to
> >>>>>>>>>>>>>>>>>>> consider
> >>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
> >>>>>>> preference
> >>>>>>>>>>>> over
> >>>>>>>>>>>>>>>>>>> time,
> >>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
> >>>> that
> >>>>>>>>>>>> minimizes the
> >>>>>>>>>>>>>>>>>>>> overall
> >>>>>>>>>>>>>>>>>>>>> cost function.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
> >>>> different
> >>>>> time
> >>>>>>>>>>>> points
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
> >> identical
> >>>>> plans
> >>>>>>> in
> >>>>>>>>>>>> all
> >>>>>>>>>>>>>>> delta
> >>>>>>>>>>>>>>>>>>>> runs
> >>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
> >> the
> >>>>>>> Tempura
> >>>>>>>>>>>> paper,
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
> >>>>> specifying
> >>>>>>> two
> >>>>>>>>>>>>>>> (logical)
> >>>>>>>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
> >> and
> >>>>> later
> >>>>>>>>>>>> delta
> >>>>>>>>>>>>>>> runs
> >>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
> >>>> supports
> >>>>>>> various
> >>>>>>>>>>>> form
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
> >>>>>>> append-only
> >>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>> model in
> >>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
> >>>>> subsumes
> >>>>>>>> the
> >>>>>>>>>>>>>>> current
> >>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
> >>>> implementations.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
> >>>>> seperate
> >>>>>>>> cost
> >>>>>>>>>>>>>>> model,
> >>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
> >>>>>>> multi-objective
> >>>>>>>>>>>>>>>>>>>> optimization,
> >>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
> >>>> considered
> >>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
> >>>> that
> >>>>>>>>>>>> converts this
> >>>>>>>>>>>>>>>>>>> cost
> >>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
> >>>> any
> >>>>> two
> >>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>> plans
> >>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
> >>>> optimum.
> >>>>> I
> >>>>>>>> guess
> >>>>>>>>>>>> we
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> go
> >>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
> >> query
> >>>>>>>>>>>> optimization
> >>>>>>>>>>>>>>>>>>> instead
> >>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>> there is a need.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
> >>>>> optimization,
> >>>>>>>>>>>> since our
> >>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
> >>>> materializing
> >>>>>>>>>>>> intermediate
> >>>>>>>>>>>>>>>>>>>> results
> >>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
> >>>> problem of
> >>>>>>>>>>>> choosing
> >>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
> >> and
> >>>>>>> reusing
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
> >> plans.
> >>>> We
> >>>>>>>>>>>> borrowed the
> >>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
> >>>>> problem
> >>>>>>>> even
> >>>>>>>>>>>>>>> though
> >>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
> >>>> think
> >>>>> our
> >>>>>>>> work
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> orthogonal
> >>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
> >>>>> views,
> >>>>>>>>>>>> lattice
> >>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
> >>>> can
> >>>>> be
> >>>>>>>>>>>> adopted to
> >>>>>>>>>>>>>>>>>>> wider
> >>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
> >>>>>>> community.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
> >>>> java
> >>>>> code,
> >>>>>>>> it
> >>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> >>>>>>>>>>>>>>> jhyde.apache@gmail.com>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Botong,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
> >>>>> research,
> >>>>>>>> and
> >>>>>>>>>>>> thank
> >>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
> >>>>> streaming,
> >>>>>>>>>>>>>>>>>>> materialized
> >>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
> >>>> As we
> >>>>>>> have
> >>>>>>>>>>>> already
> >>>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
> >>>> relational
> >>>>>>>>>>>> operators,
> >>>>>>>>>>>>>>>>>>> lattice,
> >>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
> >> see
> >>>>>>> whether
> >>>>>>>>>>>> we can
> >>>>>>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
> >>>>> others.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
> >>>> your
> >>>>>>>>>>>> relations
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
> >>>> streaming
> >>>>>>>>>>>> queries, the
> >>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
> >>>>> that you
> >>>>>>>>>>>> needed
> >>>>>>>>>>>>>>> two
> >>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
> >> maintenance”
> >>>> and
> >>>>>>>>>>>> another for
> >>>>>>>>>>>>>>>>>>> “user
> >>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
> >>>> are
> >>>>> so
> >>>>>>>>>>>> different?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
> >>>> arrival of
> >>>>>>>>>>>>>>> multi-objective
> >>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
> >>>> and
> >>>>>>> digest
> >>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>>> paper.
> >>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
> >>>>> process
> >>>>>>> to
> >>>>>>>>>>>> create
> >>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
> >>>>> community.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
> >> this
> >>>>>>>>>>>> functionality
> >>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
> >>>>> experiment
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
> >>>> setting up
> >>>>>>>> complex
> >>>>>>>>>>>>>>>>>>> databases
> >>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
> >> simple
> >>>>> DDL
> >>>>>>>>>>>> operations
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> >>>>> whether
> >>>>>>> we
> >>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>> devise
> >>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Julian
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> >>>>>>>> pkuhbt@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
> >>>>> figure,
> >>>>>>>> please
> >>>>>>>>>>>>>>> refer
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> Fig
> >>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> >>>>>>>>>>>> taojiatao@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
> >>>> the
> >>>>> mail,
> >>>>>>>>>>>> may you
> >>>>>>>>>>>>>>>>>>> open
> >>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> JIRA
> >>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
> >> can
> >>>>>>> subscribe
> >>>>>>>>>>>> to the
> >>>>>>>>>>>>>>>>>>>> JIRA?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Regards!
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
> >> 于2020年12月24日周四
> >>>>>>>> 上午3:18写道:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
> >>>> optimizer
> >>>>>>> into
> >>>>>>>> a
> >>>>>>>>>>>>>>> general
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
> >>>> research
> >>>>>>> paper
> >>>>>>>>>>>>>>>>>>> published
> >>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> VLDB
> >>>>>>>>>>>>>>>>>>>>>>>>> 2021:
> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
> >>>> framework
> >>>>> for
> >>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>> processing
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
> >> illustrating
> >>>>> how
> >>>>>>>>>>>> Alibaba’s
> >>>>>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
> >>>>> query
> >>>>>>>>>>>> optimizer
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> alleviate
> >>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
> >>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> >>>>> Resource-Aware
> >>>>>>>>>>>>>>> Incremental
> >>>>>>>>>>>>>>>>>>>>>>>> Computing
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
> >>>> general
> >>>>>>>>>>>> cost-based
> >>>>>>>>>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
> >>>>> multiple
> >>>>>>>>>>>> families
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
> >>>>>>> Streaming,
> >>>>>>>>>>>>>>>>>>> DBToaster,
> >>>>>>>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
> >>>>> generated
> >>>>>>> best
> >>>>>>>>>>>> plan
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
> >>>> each
> >>>>>>>>>>>> individual
> >>>>>>>>>>>>>>>>>>> method
> >>>>>>>>>>>>>>>>>>>>>>>> alone.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
> >>>> central
> >>>>> to
> >>>>>>>>>>>> database
> >>>>>>>>>>>>>>>>>>> view
> >>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
> >> and
> >>>> are
> >>>>>>> being
> >>>>>>>>>>>>>>> adopted
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>> active
> >>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
> >>>> approximate
> >>>>>>> query
> >>>>>>>>>>>>>>>>>>> processing,
> >>>>>>>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
> >> widening
> >>>> the
> >>>>>>>>>>>> spectrum of
> >>>>>>>>>>>>>>>>>>>>>> Calcite,
> >>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
> >> Calcite.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
> >>>>> details.
> >>>>>>>>>>>> Please
> >>>>>>>>>>>>>>>>>>> refer
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
> >>>>> working
> >>>>>>> on a
> >>>>>>>>>>>>>>> journal
> >>>>>>>>>>>>>>>>>>>>>> version
> >>>>>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
> >>>> is
> >>>>> meant
> >>>>>>>> to
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> executed
> >>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
> >> Calcite’s
> >>>>> memo
> >>>>>>> will
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> extended
> >>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
> >>>>>>> generating
> >>>>>>>>>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>>>>>>> plans
> >>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
> >>>>>>> different
> >>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>> points.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
> >> that
> >>>>>>> changes
> >>>>>>>>>>>> over
> >>>>>>>>>>>>>>> time
> >>>>>>>>>>>>>>>>>>>>>> (Time
> >>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> >>>>>>> introduced
> >>>>>>>>>>>>>>>>>>> TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>> into
> >>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
> >> to
> >>>>> track
> >>>>>>>>>>>> related
> >>>>>>>>>>>>>>>>>>> RelSets
> >>>>>>>>>>>>>>>>>>>>>> of a
> >>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> >>>>> certain
> >>>>>>>>>>>> time,
> >>>>>>>>>>>>>>>>>>> delta of
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
> >> vertical
> >>>>> line
> >>>>>>> is a
> >>>>>>>>>>>>>>>>>>> TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
> >> R,
> >>>>> etc.).
> >>>>>>>>>>>>>>> Horizontal
> >>>>>>>>>>>>>>>>>>>> lines
> >>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
> >> is a
> >>>>>>> RelSet.
> >>>>>>>>>>>> Users
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>> write
> >>>>>>>>>>>>>>>>>>>>>>>> TVR
> >>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
> >> transformations
> >>>>>>> between
> >>>>>>>>>>>> these
> >>>>>>>>>>>>>>>>>>> dots.
> >>>>>>>>>>>>>>>>>>>>>> For
> >>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> >>>> that
> >>>>>>>>>>>> describe how
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> compute
> >>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
> >>>>> TVRs.
> >>>>>>> The
> >>>>>>>>>>>> red
> >>>>>>>>>>>>>>> lines
> >>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>>> intra-TVR rules that describe transformations
> >>>>> within a
> >>>>>>>>>>>> TVR. All
> >>>>>>>>>>>>>>>>>>> TVR
> >>>>>>>>>>>>>>>>>>>>>>>> rewrite
> >>>>>>>>>>>>>>>>>>>>>>>>> rules are logical rules. All existing Calcite
> >>>>> rules
> >>>>>>>> still
> >>>>>>>>>>>> work
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>> volcano system without modification.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> All changes in this feature will consist of
> >>>> four
> >>>>>>> parts:
> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
> >>>>> TvrMetaSet
> >>>>>>>> and
> >>>>>>>>>>>>>>>>>>> RelNodes,
> >>>>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>>> well as links in between the nodes.
> >>>>>>>>>>>>>>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
> >>>>> upgraded
> >>>>>>>>>>>> rule
> >>>>>>>>>>>>>>>>>>> engine
> >>>>>>>>>>>>>>>>>>>>>> API.
> >>>>>>>>>>>>>>>>>>>>>>>>> 4. Multi-query optimization, used to find the
> >>>> best
> >>>>>>>>>>>> incremental
> >>>>>>>>>>>>>>>>>>> plan
> >>>>>>>>>>>>>>>>>>>>>>>>> involving multiple time points.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Note that this feature is an extension in
> >>>> nature
> >>>>> and
> >>>>>>>> thus
> >>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>>>> disabled,
> >>>>>>>>>>>>>>>>>>>>>>>>> does not change any existing Calcite
> >> behavior.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Other than scenarios in the paper, we also
> >>>> applied
> >>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> Calcite-extended
> >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer to a type of
> >>>> periodic
> >>>>>>> query
> >>>>>>>>>>>> called
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> ‘‘range
> >>>>>>>>>>>>>>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
> >>>> achieved
> >>>>> cost
> >>>>>>>>>>>> savings
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> 80%
> >>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> >>>>>>> end-to-end
> >>>>>>>>>>>>>>> execution
> >>>>>>>>>>>>>>>>>>>>>> time.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> All comments and suggestions are welcome.
> >>>> Thanks
> >>>>> and
> >>>>>>>> happy
> >>>>>>>>>>>>>>>>>>> holidays!
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~
> >>>>>>>>>>>>>>>>>>> no mistakes
> >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Viliam Durina
> >>>>>>> Jet Developer
> >>>>>>>      hazelcast®
> >>>>>>>
> >>>>>>>  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo,
> >> CA
> >>>>> 94402 |
> >>>>>>> USA
> >>>>>>> +1 (650) 521-5453 <(650)%20521-5453> <(650)%20521-5453> |
> hazelcast.com <
> >> https://www.hazelcast.com>
> >>>>>>>
> >>>>>>> --
> >>>>>>> This message contains confidential information and is intended
> >> only
> >>>> for
> >>>>>>> the
> >>>>>>> individuals named. If you are not the named addressee you should
> >> not
> >>>>>>> disseminate, distribute or copy this e-mail. Please notify the
> >>>> sender
> >>>>>>> immediately by e-mail if you have received this e-mail by mistake
> >>>> and
> >>>>>>> delete this e-mail from your system. E-mail transmission cannot be
> >>>>>>> guaranteed to be secure or error-free as information could be
> >>>>> intercepted,
> >>>>>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
> >>>>> viruses.
> >>>>>>> The sender therefore does not accept liability for any errors or
> >>>>> omissions
> >>>>>>> in the contents of this message, which arise as a result of e-mail
> >>>>>>> transmission. If verification is required, please request a
> >>>> hard-copy
> >>>>>>> version. -Hazelcast
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi Julian and all,

Please find our rebased Tempura code on top of a fairly recent version of
Calcite at:
https://github.com/hbtoo/calcite/tree/botong

There are six new commits in total:
739119282 Hack around the type issue to make tvr unit tests work, see
CALCITE-4713
c84da7216 4. Full tempura integration: all TVR rules and complete
optimization procedure. Newly added files only.
d3217f432 3. Full tempura integration: all TVR rules and complete
optimization procedure. Modifications to existing files only.
59342a654 2. Tempura core memo structure, rule engine, and interfaces.
Newly added files only.
0d310841d 1. Tempura core memo structure, rule engine, and interfaces.
Modifications to existing files only.
c1240ca7b 0. Add volcano visualizer for debugging.

The first three (0, 1, 2) is a compilable version with extended core system
support:
1. Memo extension with TvrMetaSet
2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as
well as links in between the nodes.

The next two (3 and 4) is a full version with:
All changes in this feature will consist of four parts:
3. A provided set of TvrRules, written using the upgraded rule engine API.
4. TvrVolcanoPlanner that puts everything together end to end.
5. Multi-query optimization, used to find the best incremental plan
involving multiple time points.

With up to 4, all existing CALCITE unit tests pass.


To demonstrate how Tempura works, we have added the following two example
unit tests that can be run directly (with the last commit to hack around
CALCITE-4713):

TvrOptimizationTest.java runs the Tempura optimizer. This program produces
a progressive physical plan by the Tempura optimizer that runs across
several time points. The physical plan is printed out to the console in DOT
format, which can be viewed using an online graphviz tool.

TvrExecutionTest.java uses the Tempura optimizer in an end-to-end query.
This program generates a progressive physical plan and then uses Calcite's
built-in executor to run the plan. The output at each time point is printed
to the console.

Everyone is welcome and encouraged to take a look and play with it. Let's
take some time and figure out a plan on how to incorporate Tempura into
Calcite that best suits everyone.


Thanks,
Botong

On Fri, May 14, 2021 at 9:32 AM Botong Huang <pk...@gmail.com> wrote:

> Hi all,
>
> Thank you all for the interest, and thanks Julian for the update!
>
> I am having problems uploading the pdf files into the Jira:
> https://issues.apache.org/jira/browse/CALCITE-4568
> so I am attaching the slides and the original paper here in this email.
>
> The slides have a walking example of how Tempura expands its memo. The
> current code base is at:
> alibaba/cost-based-incremental-optimizer
> <https://github.com/alibaba/cost-based-incremental-optimizer>
> with two e2e unit tests at TvrOptimizationTest.java and
> TvrExecutionTest.java.
>
> Please feel free to start playing with them, and feel free to reach out
> and possibly schedule another meeting if needed.
>
> As agreed in the meeting, we will rebase our code to a newer version of
> Calcite.
>
> Best,
> Botong
>
> On Thu, May 13, 2021 at 12:47 PM Julian Hyde <jh...@gmail.com>
> wrote:
>
>> During the meeting we agreed to start progressing this contribution in
>> the usual Apache Way, with conversations on the dev list and in the
>> https://issues.apache.org/jira/browse/CALCITE-4568 <
>> https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it
>> should be easy for you to participate.
>>
>> Botong said he would share the slides. (He might be unwilling to make
>> them public, because they are his presentation for a conference that has
>> not happened yet. Reach out to him one-to-one.)
>>
>> Next step is for someone on the Alibaba side to create a PR that is
>> rebased on the latest Calcite master, and add a comment to the JIRA case.
>> Then we can discuss what needs to be done for that PR. Code quality, adding
>> comments, breaking up into smaller commits, additional tests, renaming
>> packages/classes, restructuring into plugins are all possibilities.
>>
>> Our side of the bargain, as committers, is that we should review in a
>> timely manner, and not move the goal posts — if the contributors make the
>> changes we request then we will land this code in master in a reasonable
>> amount of time.
>>
>> We also discussed incremental view maintenance (IVM). Tempura solves a
>> more general problem (finding the optimal K steps to maintain a
>> materialized view as data arrives in K points in time) but if we set K=2,
>> we can generate a plan for how to update a materialized view given a delta
>> table. The plan will be different based on cost - e.g. whether the delta
>> table is small or large. This is a problem that many of our users would
>> like to solve. It will exercise much of Tempura’s code base, and encourage
>> contributions.
>>
>> In my opinion, we should do IVM at launch. It should be the main example
>> we use in conference talks, blog posts, etc. When people understand that
>> case, we can explain how we generalize from K=2 to arbitrary K.
>>
>> Julian
>>
>>
>> > On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
>> >
>> > I apologize that I had a wrong impression on the meeting time (I
>> thought it
>> > should be on Thursday but it is Wednesday). I can follow up your meeting
>> > records if you have any.
>> >
>> >
>> > -Rui
>> >
>> > On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:
>> >
>> >> Hi all,
>> >>
>> >> This is a reminder that we are going to have our second discussion
>> meeting
>> >> tomorrow at 10-11pm PST. Please find the link below, everyone is
>> welcome to
>> >> join!
>> >>
>> >> Join Zoom Meeting
>> >> https://uci.zoom.us/j/91986206610
>> >> <
>> >>
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
>> >>>
>> >>
>> >> Meeting ID: 919 8620 6610
>> >> One tap mobile
>> >> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
>> >> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
>> >>
>> >> Dial by your location
>> >>        +1 669 900 6833 <(669)%20900-6833> US (San Jose)
>> >>        +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
>> >>        +1 346 248 7799 <(346)%20248-7799> US (Houston)
>> >>        +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
>> >>        +1 312 626 6799 <(312)%20626-6799> US (Chicago)
>> >>        +1 646 558 8656 <(646)%20558-8656> US (New York)
>> >> Meeting ID: 919 8620 6610
>> >> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
>> >> <
>> >>
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
>> >>>
>> >>
>> >> Join by Skype for Business
>> >> https://uci.zoom.us/skype/91986206610
>> >> <
>> >>
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
>> >>>
>> >>
>> >> Thanks,
>> >> Botong
>> >>
>> >> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
>> >>
>> >>> Hi Stamatis and all,
>> >>>
>> >>> Thanks for the interest! Let's tentatively schedule the next meeting
>> next
>> >>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's
>> >> new
>> >>> needs showing up.
>> >>>
>> >>> Best,
>> >>> Botong
>> >>>
>> >>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <zabetak@gmail.com
>> >
>> >>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> I really regret missing the first meeting, sorry about that. I added
>> my
>> >>>> preferences in the document.
>> >>>> I will make sure to attend the next one and help as much as I can.
>> >>>>
>> >>>> I didn't have the chance yet to go over the paper but will try to do
>> it
>> >>>> before the next meeting.
>> >>>>
>> >>>> For me the following dates are more convenient than others so it
>> would
>> >> be
>> >>>> nice if we could arrange it then.
>> >>>>
>> >>>> Thu, May 6, 10pm PST
>> >>>> Tue, May 12, 10pm PST
>> >>>>
>> >>>> Best,
>> >>>> Stamatis
>> >>>>
>> >>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
>> >>>>
>> >>>>> I have added my time preferences to the doc [1]. I am generally
>> >>>>> available any evening Mon - Thu. How about we meet Monday 10th May?
>> >>>>>
>> >>>>> Stamatis, Jesus, Given the complexity of this work, I would very
>> much
>> >>>>> appreciate your insight, as experts in optimizer theory. Could one
>> of
>> >>>>> you join the next meeting? Of course we should choose a time that
>> >>>>> works for everyone's schedule.
>> >>>>>
>> >>>>> Julian
>> >>>>>
>> >>>>> [1]
>> >>>>>
>> >>>>
>> >>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >>>>>
>> >>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
>> >> wrote:
>> >>>>>>
>> >>>>>> We didn't record it, we will try to record the following meetings.
>> >>>> Please
>> >>>>>> add your time preference in the docs, so that we can find a meeting
>> >>>> time
>> >>>>>> that works for more people.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Botong
>> >>>>>>
>> >>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
>> >> viliam@hazelcast.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>> Is there a recording available?
>> >>>>>>> Viliam
>> >>>>>>>
>> >>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
>> >>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi all,
>> >>>>>>>>
>> >>>>>>>> The meeting yesterday was fun and productive. As discussed, this
>> >>>> is
>> >>>>> the
>> >>>>>>>> call to schedule our second meeting.
>> >>>>>>>>
>> >>>>>>>> We encourage everyone to add their time preferences during
>> >> 05/01 -
>> >>>>> 05/15
>> >>>>>>>> here:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Botong
>> >>>>>>>>
>> >>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all,
>> >>>>>>>>> We've created a zoom meeting below for our meeting next Monday
>> >>>>>>>>> (9pm-10:30pm PST on 04/26).
>> >>>>>>>>> Talk to you all soon!
>> >>>>>>>>>
>> >>>>>>>>> Join Zoom Meeting
>> >>>>>>>>> https://uci.zoom.us/j/91279732686
>> >>>>>>>>> <
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Meeting ID: 912 7973 2686
>> >>>>>>>>> One tap mobile
>> >>>>>>>>> +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
>> >>>>>>>>> +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
>> >>>>>>>>>
>> >>>>>>>>> Dial by your location
>> >>>>>>>>> +1 669 900 6833 <(669)%20900-6833> US (San Jose)
>> >>>>>>>>> +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
>> >>>>>>>>> +1 346 248 7799 <(346)%20248-7799> US (Houston)
>> >>>>>>>>> +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
>> >>>>>>>>> +1 312 626 6799 <(312)%20626-6799> US (Chicago)
>> >>>>>>>>> +1 646 558 8656 <(646)%20558-8656> US (New York)
>> >>>>>>>>> Meeting ID: 912 7973 2686
>> >>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
>> >>>>>>>>> <
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Join by Skype for Business
>> >>>>>>>>> https://uci.zoom.us/skype/91279732686
>> >>>>>>>>> <
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Botong
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
>> >> pkuhbt@gmail.com
>> >>>>>
>> >>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hi all,
>> >>>>>>>>>>
>> >>>>>>>>>> According to the preferences collected, we are tentatively
>> >>>>> scheduling
>> >>>>>>>> our
>> >>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
>> >>>>>>>>>>
>> >>>>>>>>>> We will give a presentation about Tempura, followed by a free
>> >>>>>>>> discussion.
>> >>>>>>>>>>
>> >>>>>>>>>> Please let us know if there are new other requests. Few days
>> >>>>> before
>> >>>>>>>>>> the meeting, I will send out a zoom meeting link.
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks,
>> >>>>>>>>>> Botong
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
>> >> pkuhbt@gmail.com>
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Hi Julian and all,
>> >>>>>>>>>>>
>> >>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
>> >> a
>> >>>>> quick
>> >>>>>>>> peek
>> >>>>>>>>>>> at the last five commits.
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>> >>>>>>>>>>>
>> >>>>>>>>>>> I've also opened a Jira (CALCITE-4568
>> >>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
>> >> which
>> >>>>> will
>> >>>>>>>> serve
>> >>>>>>>>>>> as the umbrella Jira for the feature.
>> >>>>>>>>>>>
>> >>>>>>>>>>> In the meantime, we encourage everyone to enter the time
>> >>>>> preferences
>> >>>>>>>> for
>> >>>>>>>>>>> our first meeting here:
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks,
>> >>>>>>>>>>> Botong
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
>> >>>>> jhyde.apache@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> I have added my time preferences to the doc.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Initial discussions will need to be about architecture and
>> >>>>>>> high-level
>> >>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
>> >> PR
>> >>>>>>>> line-by-line
>> >>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
>> >>>>> design
>> >>>>>>>>>>>> holistically, and prepare questions/comments before the
>> >>>> meeting.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
>> >>>> task?
>> >>>>>>> JIRA
>> >>>>>>>>>>>> how we track long-running tasks such as this.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Julian
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
>> >> pkuhbt@gmail.com
>> >>>>>
>> >>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
>> >> our
>> >>>>> code
>> >>>>>>>> base
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
>> >>>> peek.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
>> >>>> for a
>> >>>>> Zoom
>> >>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
>> >> Calcite.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Since some of our team members are in China, we prefer
>> >> the
>> >>>>> time
>> >>>>>>> slot
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
>> >> preference
>> >>>> in
>> >>>>> the
>> >>>>>>>>>>>> shared
>> >>>>>>>>>>>>> doc below.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> We encourage everyone to add their time preferences
>> >> (during
>> >>>>>>>>>>>> 04/15-04/30) in
>> >>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
>> >>>> that
>> >>>>> works
>> >>>>>>>> for
>> >>>>>>>>>>>>> most.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> Botong
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
>> >>>>> pkuhbt@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Hi Julian and Rui,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
>> >>>> some
>> >>>>>>> slides
>> >>>>>>>>>>>> for the
>> >>>>>>>>>>>>>> meeting.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
>> >> free
>> >>>> to
>> >>>>> add
>> >>>>>>>>>>>> more in
>> >>>>>>>>>>>>>> here:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>> Botong
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
>> >>>>>>>> jhyde.apache@gmail.com
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
>> >>>>> idea. I
>> >>>>>>>>>>>> think we
>> >>>>>>>>>>>>>>> should create it to continue discussion after the first
>> >>>>> meeting.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Julian
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
>> >>>>>>>> jhyde.apache@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
>> >>>> The
>> >>>>> PR
>> >>>>>>> will
>> >>>>>>>>>>>> allow
>> >>>>>>>>>>>>>>> us to read the code, but I think we should do the first
>> >>>>> round of
>> >>>>>>>>>>>> questions
>> >>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
>> >>>>>>>>>>>> presentation of the
>> >>>>>>>>>>>>>>> paper (do you have some slides you are planning to
>> >>>> present
>> >>>>> at
>> >>>>>>>> VLDB,
>> >>>>>>>>>>>>>>> Botong?) and then move on to questions about the
>> >>>> concepts,
>> >>>>> which
>> >>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
>> >>>> onto
>> >>>>>>> other
>> >>>>>>>>>>>> current
>> >>>>>>>>>>>>>>> and future concepts in calcite.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
>> >>>>> line-by-line
>> >>>>>>> at
>> >>>>>>>>>>>> this
>> >>>>>>>>>>>>>>> point. We need to understand the high-level concepts
>> >> and
>> >>>>> design
>> >>>>>>>>>>>> choices. If
>> >>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
>> >>>> details.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
>> >> doubt
>> >>>>> that we
>> >>>>>>>>>>>> will be
>> >>>>>>>>>>>>>>> able to integrate everything, but we can build
>> >>>> understanding
>> >>>>>>> about
>> >>>>>>>>>>>> where
>> >>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
>> >>>> of
>> >>>>> code
>> >>>>>>> to
>> >>>>>>>>>>>> help us
>> >>>>>>>>>>>>>>> get there.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> As I said before, after the integration I would like
>> >>>>> people to
>> >>>>>>> be
>> >>>>>>>>>>>> able
>> >>>>>>>>>>>>>>> to experiment with it and use it in their production
>> >>>>> systems.
>> >>>>>>>> That
>> >>>>>>>>>>>> way, it
>> >>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
>> >> set
>> >>>>>>>>>>>> integrates with
>> >>>>>>>>>>>>>>> other calcite features and gets stronger over time.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Julian
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
>> >>>>> amaliujia@apache.org>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> For me to participate in the discussion for the
>> >> above
>> >>>>>>>> questions,
>> >>>>>>>>>>>> I
>> >>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
>> >>>>> likely
>> >>>>>>> ask
>> >>>>>>>>>>>> lots of
>> >>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
>> >>>>> questions
>> >>>>>>>> and
>> >>>>>>>>>>>> back
>> >>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>> forward discussion.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> -Rui
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
>> >>>>>>>> amaliujia@apache.org
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
>> >>>>> (review
>> >>>>>>>> code
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>> doc,
>> >>>>>>>>>>>>>>>>>> etc.).
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> While you can share your code so people can have
>> >> more
>> >>>>> idea
>> >>>>>>> how
>> >>>>>>>>>>>> it is
>> >>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
>> >>>> doc
>> >>>>> to
>> >>>>>>>>>>>> discuss
>> >>>>>>>>>>>>>>> open
>> >>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
>> >>>> here:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
>> >>>>> solutions in
>> >>>>>>>>>>>> Calcite
>> >>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
>> >>>> multi-query
>> >>>>>>>>>>>> optimization
>> >>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
>> >>>> Spool
>> >>>>>>>>>>>> operator),
>> >>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
>> >>>> models
>> >>>>> -
>> >>>>>>> one
>> >>>>>>>>>>>> for
>> >>>>>>>>>>>>>>> “view
>> >>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
>> >>>> the
>> >>>>>>>>>>>> objectives of
>> >>>>>>>>>>>>>>> each
>> >>>>>>>>>>>>>>>>>> activity are so different?
>> >>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
>> >>>>>>> multi-objective
>> >>>>>>>>>>>>>>> parametric
>> >>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
>> >>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> [1]:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> -Rui
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
>> >>>>> zinking3@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
>> >>>>>>>>>>>> pkuhbt@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Hi Julian,
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
>> >>>>> wondering
>> >>>>>>> if
>> >>>>>>>> it
>> >>>>>>>>>>>>>>> would
>> >>>>>>>>>>>>>>>>>>> help
>> >>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>>>> Botong
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
>> >>>>>>>> pkuhbt@gmail.com
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Hi Julian,
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
>> >>>> plan
>> >>>>>>> that
>> >>>>>>>>>>>> best
>> >>>>>>>>>>>>>>>>>>> benefits
>> >>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
>> >>>>> hopefully
>> >>>>>>>>>>>> answer
>> >>>>>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>>>>>> questions.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
>> >>>> time
>> >>>>>>> points
>> >>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> consider
>> >>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
>> >>>>>>> preference
>> >>>>>>>>>>>> over
>> >>>>>>>>>>>>>>>>>>> time,
>> >>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
>> >>>> that
>> >>>>>>>>>>>> minimizes the
>> >>>>>>>>>>>>>>>>>>>> overall
>> >>>>>>>>>>>>>>>>>>>>> cost function.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
>> >>>> different
>> >>>>> time
>> >>>>>>>>>>>> points
>> >>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
>> >> identical
>> >>>>> plans
>> >>>>>>> in
>> >>>>>>>>>>>> all
>> >>>>>>>>>>>>>>> delta
>> >>>>>>>>>>>>>>>>>>>> runs
>> >>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
>> >> the
>> >>>>>>> Tempura
>> >>>>>>>>>>>> paper,
>> >>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
>> >>>>> specifying
>> >>>>>>> two
>> >>>>>>>>>>>>>>> (logical)
>> >>>>>>>>>>>>>>>>>>>> time
>> >>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
>> >> and
>> >>>>> later
>> >>>>>>>>>>>> delta
>> >>>>>>>>>>>>>>> runs
>> >>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
>> >>>> supports
>> >>>>>>> various
>> >>>>>>>>>>>> form
>> >>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
>> >>>>>>> append-only
>> >>>>>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>> model in
>> >>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
>> >>>>> subsumes
>> >>>>>>>> the
>> >>>>>>>>>>>>>>> current
>> >>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
>> >>>> implementations.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
>> >>>>> seperate
>> >>>>>>>> cost
>> >>>>>>>>>>>>>>> model,
>> >>>>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
>> >>>>>>> multi-objective
>> >>>>>>>>>>>>>>>>>>>> optimization,
>> >>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
>> >>>> considered
>> >>>>>>>>>>>> different
>> >>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
>> >>>> that
>> >>>>>>>>>>>> converts this
>> >>>>>>>>>>>>>>>>>>> cost
>> >>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
>> >>>> any
>> >>>>> two
>> >>>>>>>>>>>>>>> incremental
>> >>>>>>>>>>>>>>>>>>>> plans
>> >>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
>> >>>> optimum.
>> >>>>> I
>> >>>>>>>> guess
>> >>>>>>>>>>>> we
>> >>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>> go
>> >>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
>> >> query
>> >>>>>>>>>>>> optimization
>> >>>>>>>>>>>>>>>>>>> instead
>> >>>>>>>>>>>>>>>>>>>> if
>> >>>>>>>>>>>>>>>>>>>>> there is a need.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
>> >>>>> optimization,
>> >>>>>>>>>>>> since our
>> >>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
>> >>>> materializing
>> >>>>>>>>>>>> intermediate
>> >>>>>>>>>>>>>>>>>>>> results
>> >>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
>> >>>> problem of
>> >>>>>>>>>>>> choosing
>> >>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
>> >> and
>> >>>>>>> reusing
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
>> >> plans.
>> >>>> We
>> >>>>>>>>>>>> borrowed the
>> >>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
>> >>>>> problem
>> >>>>>>>> even
>> >>>>>>>>>>>>>>> though
>> >>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
>> >>>> think
>> >>>>> our
>> >>>>>>>> work
>> >>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>> orthogonal
>> >>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
>> >>>>> views,
>> >>>>>>>>>>>> lattice
>> >>>>>>>>>>>>>>> etc.
>> >>>>>>>>>>>>>>>>>>> We
>> >>>>>>>>>>>>>>>>>>>> do
>> >>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
>> >>>> can
>> >>>>> be
>> >>>>>>>>>>>> adopted to
>> >>>>>>>>>>>>>>>>>>> wider
>> >>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
>> >>>>>>> community.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
>> >>>> java
>> >>>>> code,
>> >>>>>>>> it
>> >>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>>>>> Botong
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>> >>>>>>>>>>>>>>> jhyde.apache@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Botong,
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
>> >>>>> research,
>> >>>>>>>> and
>> >>>>>>>>>>>> thank
>> >>>>>>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
>> >>>>> streaming,
>> >>>>>>>>>>>>>>>>>>> materialized
>> >>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
>> >>>> As we
>> >>>>>>> have
>> >>>>>>>>>>>> already
>> >>>>>>>>>>>>>>>>>>> some
>> >>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
>> >>>> relational
>> >>>>>>>>>>>> operators,
>> >>>>>>>>>>>>>>>>>>> lattice,
>> >>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
>> >> see
>> >>>>>>> whether
>> >>>>>>>>>>>> we can
>> >>>>>>>>>>>>>>>>>>> make
>> >>>>>>>>>>>>>>>>>>>> them
>> >>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
>> >>>>> others.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
>> >>>> your
>> >>>>>>>>>>>> relations
>> >>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>> used
>> >>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
>> >>>> streaming
>> >>>>>>>>>>>> queries, the
>> >>>>>>>>>>>>>>>>>>> only
>> >>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
>> >>>>> that you
>> >>>>>>>>>>>> needed
>> >>>>>>>>>>>>>>> two
>> >>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
>> >> maintenance”
>> >>>> and
>> >>>>>>>>>>>> another for
>> >>>>>>>>>>>>>>>>>>> “user
>> >>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
>> >>>> are
>> >>>>> so
>> >>>>>>>>>>>> different?
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
>> >>>> arrival of
>> >>>>>>>>>>>>>>> multi-objective
>> >>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
>> >>>> and
>> >>>>>>> digest
>> >>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>>>> paper.
>> >>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
>> >>>>> process
>> >>>>>>> to
>> >>>>>>>>>>>> create
>> >>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
>> >>>>> community.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
>> >> this
>> >>>>>>>>>>>> functionality
>> >>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
>> >>>>> experiment
>> >>>>>>>>>>>> with
>> >>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
>> >>>> setting up
>> >>>>>>>> complex
>> >>>>>>>>>>>>>>>>>>> databases
>> >>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
>> >> simple
>> >>>>> DDL
>> >>>>>>>>>>>> operations
>> >>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
>> >>>>> whether
>> >>>>>>> we
>> >>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>>>>> devise
>> >>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Julian
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
>> >>>>>>>> pkuhbt@gmail.com
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
>> >>>>> figure,
>> >>>>>>>> please
>> >>>>>>>>>>>>>>> refer
>> >>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>> Fig
>> >>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>> Botong
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>> >>>>>>>>>>>> taojiatao@gmail.com>
>> >>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
>> >>>> the
>> >>>>> mail,
>> >>>>>>>>>>>> may you
>> >>>>>>>>>>>>>>>>>>> open
>> >>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>> JIRA
>> >>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
>> >> can
>> >>>>>>> subscribe
>> >>>>>>>>>>>> to the
>> >>>>>>>>>>>>>>>>>>>> JIRA?
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Regards!
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
>> >> 于2020年12月24日周四
>> >>>>>>>> 上午3:18写道:
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
>> >>>> optimizer
>> >>>>>>> into
>> >>>>>>>> a
>> >>>>>>>>>>>>>>> general
>> >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
>> >>>> research
>> >>>>>>> paper
>> >>>>>>>>>>>>>>>>>>> published
>> >>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>> VLDB
>> >>>>>>>>>>>>>>>>>>>>>>>>> 2021:
>> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
>> >>>> framework
>> >>>>> for
>> >>>>>>>>>>>>>>> incremental
>> >>>>>>>>>>>>>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>>>>>> processing
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
>> >> illustrating
>> >>>>> how
>> >>>>>>>>>>>> Alibaba’s
>> >>>>>>>>>>>>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
>> >>>>> query
>> >>>>>>>>>>>> optimizer
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>> alleviate
>> >>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
>> >>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
>> >>>>> Resource-Aware
>> >>>>>>>>>>>>>>> Incremental
>> >>>>>>>>>>>>>>>>>>>>>>>> Computing
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
>> >>>> general
>> >>>>>>>>>>>> cost-based
>> >>>>>>>>>>>>>>>>>>>>>> incremental
>> >>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
>> >>>>> multiple
>> >>>>>>>>>>>> families
>> >>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
>> >>>>>>> Streaming,
>> >>>>>>>>>>>>>>>>>>> DBToaster,
>> >>>>>>>>>>>>>>>>>>>>>> etc.
>> >>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
>> >>>>> generated
>> >>>>>>> best
>> >>>>>>>>>>>> plan
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
>> >>>> each
>> >>>>>>>>>>>> individual
>> >>>>>>>>>>>>>>>>>>> method
>> >>>>>>>>>>>>>>>>>>>>>>>> alone.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
>> >>>> central
>> >>>>> to
>> >>>>>>>>>>>> database
>> >>>>>>>>>>>>>>>>>>> view
>> >>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
>> >> and
>> >>>> are
>> >>>>>>> being
>> >>>>>>>>>>>>>>> adopted
>> >>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>> active
>> >>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
>> >>>> approximate
>> >>>>>>> query
>> >>>>>>>>>>>>>>>>>>> processing,
>> >>>>>>>>>>>>>>>>>>>>>> etc.
>> >>>>>>>>>>>>>>>>>>>>>>>> We
>> >>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
>> >> widening
>> >>>> the
>> >>>>>>>>>>>> spectrum of
>> >>>>>>>>>>>>>>>>>>>>>> Calcite,
>> >>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
>> >> Calcite.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
>> >>>>> details.
>> >>>>>>>>>>>> Please
>> >>>>>>>>>>>>>>>>>>> refer
>> >>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
>> >>>>> working
>> >>>>>>> on a
>> >>>>>>>>>>>>>>> journal
>> >>>>>>>>>>>>>>>>>>>>>> version
>> >>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
>> >>>> is
>> >>>>> meant
>> >>>>>>>> to
>> >>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>> executed
>> >>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
>> >> Calcite’s
>> >>>>> memo
>> >>>>>>> will
>> >>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>> extended
>> >>>>>>>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
>> >>>>>>> generating
>> >>>>>>>>>>>>>>>>>>> incremental
>> >>>>>>>>>>>>>>>>>>>>>>>> plans
>> >>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
>> >>>>>>> different
>> >>>>>>>>>>>> time
>> >>>>>>>>>>>>>>>>>>> points.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
>> >> that
>> >>>>>>> changes
>> >>>>>>>>>>>> over
>> >>>>>>>>>>>>>>> time
>> >>>>>>>>>>>>>>>>>>>>>> (Time
>> >>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
>> >>>>>>> introduced
>> >>>>>>>>>>>>>>>>>>> TvrMetaSet
>> >>>>>>>>>>>>>>>>>>>>>> into
>> >>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
>> >> to
>> >>>>> track
>> >>>>>>>>>>>> related
>> >>>>>>>>>>>>>>>>>>> RelSets
>> >>>>>>>>>>>>>>>>>>>>>> of a
>> >>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
>> >>>>> certain
>> >>>>>>>>>>>> time,
>> >>>>>>>>>>>>>>>>>>> delta of
>> >>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
>> >> vertical
>> >>>>> line
>> >>>>>>> is a
>> >>>>>>>>>>>>>>>>>>> TvrMetaSet
>> >>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
>> >> R,
>> >>>>> etc.).
>> >>>>>>>>>>>>>>> Horizontal
>> >>>>>>>>>>>>>>>>>>>> lines
>> >>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
>> >> is a
>> >>>>>>> RelSet.
>> >>>>>>>>>>>> Users
>> >>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>> write
>> >>>>>>>>>>>>>>>>>>>>>>>> TVR
>> >>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
>> >> transformations
>> >>>>>>> between
>> >>>>>>>>>>>> these
>> >>>>>>>>>>>>>>>>>>> dots.
>> >>>>>>>>>>>>>>>>>>>>>> For
>> >>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
>> >>>> that
>> >>>>>>>>>>>> describe how
>> >>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>> compute
>> >>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
>> >>>>> TVRs.
>> >>>>>>> The
>> >>>>>>>>>>>> red
>> >>>>>>>>>>>>>>> lines
>> >>>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>>>> intra-TVR rules that describe transformations
>> >>>>> within a
>> >>>>>>>>>>>> TVR. All
>> >>>>>>>>>>>>>>>>>>> TVR
>> >>>>>>>>>>>>>>>>>>>>>>>> rewrite
>> >>>>>>>>>>>>>>>>>>>>>>>>> rules are logical rules. All existing Calcite
>> >>>>> rules
>> >>>>>>>> still
>> >>>>>>>>>>>> work
>> >>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> new
>> >>>>>>>>>>>>>>>>>>>>>>>>> volcano system without modification.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> All changes in this feature will consist of
>> >>>> four
>> >>>>>>> parts:
>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
>> >>>>> TvrMetaSet
>> >>>>>>>> and
>> >>>>>>>>>>>>>>>>>>> RelNodes,
>> >>>>>>>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>>>>>> well as links in between the nodes.
>> >>>>>>>>>>>>>>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
>> >>>>> upgraded
>> >>>>>>>>>>>> rule
>> >>>>>>>>>>>>>>>>>>> engine
>> >>>>>>>>>>>>>>>>>>>>>> API.
>> >>>>>>>>>>>>>>>>>>>>>>>>> 4. Multi-query optimization, used to find the
>> >>>> best
>> >>>>>>>>>>>> incremental
>> >>>>>>>>>>>>>>>>>>> plan
>> >>>>>>>>>>>>>>>>>>>>>>>>> involving multiple time points.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Note that this feature is an extension in
>> >>>> nature
>> >>>>> and
>> >>>>>>>> thus
>> >>>>>>>>>>>> when
>> >>>>>>>>>>>>>>>>>>>>>> disabled,
>> >>>>>>>>>>>>>>>>>>>>>>>>> does not change any existing Calcite
>> >> behavior.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Other than scenarios in the paper, we also
>> >>>> applied
>> >>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>> Calcite-extended
>> >>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer to a type of
>> >>>> periodic
>> >>>>>>> query
>> >>>>>>>>>>>> called
>> >>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>> ‘‘range
>> >>>>>>>>>>>>>>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
>> >>>> achieved
>> >>>>> cost
>> >>>>>>>>>>>> savings
>> >>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>> 80%
>> >>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>> total CPU and memory consumption, and 60% on
>> >>>>>>> end-to-end
>> >>>>>>>>>>>>>>> execution
>> >>>>>>>>>>>>>>>>>>>>>> time.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> All comments and suggestions are welcome.
>> >>>> Thanks
>> >>>>> and
>> >>>>>>>> happy
>> >>>>>>>>>>>>>>>>>>> holidays!
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>> Botong
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~
>> >>>>>>>>>>>>>>>>>>> no mistakes
>> >>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Viliam Durina
>> >>>>>>> Jet Developer
>> >>>>>>>      hazelcast®
>> >>>>>>>
>> >>>>>>>  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo,
>> >> CA
>> >>>>> 94402 |
>> >>>>>>> USA
>> >>>>>>> +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
>> >> https://www.hazelcast.com>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> This message contains confidential information and is intended
>> >> only
>> >>>> for
>> >>>>>>> the
>> >>>>>>> individuals named. If you are not the named addressee you should
>> >> not
>> >>>>>>> disseminate, distribute or copy this e-mail. Please notify the
>> >>>> sender
>> >>>>>>> immediately by e-mail if you have received this e-mail by mistake
>> >>>> and
>> >>>>>>> delete this e-mail from your system. E-mail transmission cannot be
>> >>>>>>> guaranteed to be secure or error-free as information could be
>> >>>>> intercepted,
>> >>>>>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>> >>>>> viruses.
>> >>>>>>> The sender therefore does not accept liability for any errors or
>> >>>>> omissions
>> >>>>>>> in the contents of this message, which arise as a result of e-mail
>> >>>>>>> transmission. If verification is required, please request a
>> >>>> hard-copy
>> >>>>>>> version. -Hazelcast
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Julian Hyde <jh...@gmail.com>.
During the meeting we agreed to start progressing this contribution in the usual Apache Way, with conversations on the dev list and in the https://issues.apache.org/jira/browse/CALCITE-4568 <https://issues.apache.org/jira/browse/CALCITE-4568> JIRA case. So, it should be easy for you to participate.

Botong said he would share the slides. (He might be unwilling to make them public, because they are his presentation for a conference that has not happened yet. Reach out to him one-to-one.)

Next step is for someone on the Alibaba side to create a PR that is rebased on the latest Calcite master, and add a comment to the JIRA case. Then we can discuss what needs to be done for that PR. Code quality, adding comments, breaking up into smaller commits, additional tests, renaming packages/classes, restructuring into plugins are all possibilities.

Our side of the bargain, as committers, is that we should review in a timely manner, and not move the goal posts — if the contributors make the changes we request then we will land this code in master in a reasonable amount of time. 

We also discussed incremental view maintenance (IVM). Tempura solves a more general problem (finding the optimal K steps to maintain a materialized view as data arrives in K points in time) but if we set K=2, we can generate a plan for how to update a materialized view given a delta table. The plan will be different based on cost - e.g. whether the delta table is small or large. This is a problem that many of our users would like to solve. It will exercise much of Tempura’s code base, and encourage contributions.

In my opinion, we should do IVM at launch. It should be the main example we use in conference talks, blog posts, etc. When people understand that case, we can explain how we generalize from K=2 to arbitrary K.

Julian


> On May 13, 2021, at 9:51 AM, Rui Wang <am...@apache.org> wrote:
> 
> I apologize that I had a wrong impression on the meeting time (I thought it
> should be on Thursday but it is Wednesday). I can follow up your meeting
> records if you have any.
> 
> 
> -Rui
> 
> On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> This is a reminder that we are going to have our second discussion meeting
>> tomorrow at 10-11pm PST. Please find the link below, everyone is welcome to
>> join!
>> 
>> Join Zoom Meeting
>> https://uci.zoom.us/j/91986206610
>> <
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
>>> 
>> 
>> Meeting ID: 919 8620 6610
>> One tap mobile
>> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
>> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
>> 
>> Dial by your location
>>        +1 669 900 6833 <(669)%20900-6833> US (San Jose)
>>        +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
>>        +1 346 248 7799 <(346)%20248-7799> US (Houston)
>>        +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
>>        +1 312 626 6799 <(312)%20626-6799> US (Chicago)
>>        +1 646 558 8656 <(646)%20558-8656> US (New York)
>> Meeting ID: 919 8620 6610
>> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
>> <
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
>>> 
>> 
>> Join by Skype for Business
>> https://uci.zoom.us/skype/91986206610
>> <
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
>>> 
>> 
>> Thanks,
>> Botong
>> 
>> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
>> 
>>> Hi Stamatis and all,
>>> 
>>> Thanks for the interest! Let's tentatively schedule the next meeting next
>>> Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's
>> new
>>> needs showing up.
>>> 
>>> Best,
>>> Botong
>>> 
>>> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I really regret missing the first meeting, sorry about that. I added my
>>>> preferences in the document.
>>>> I will make sure to attend the next one and help as much as I can.
>>>> 
>>>> I didn't have the chance yet to go over the paper but will try to do it
>>>> before the next meeting.
>>>> 
>>>> For me the following dates are more convenient than others so it would
>> be
>>>> nice if we could arrange it then.
>>>> 
>>>> Thu, May 6, 10pm PST
>>>> Tue, May 12, 10pm PST
>>>> 
>>>> Best,
>>>> Stamatis
>>>> 
>>>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>>> I have added my time preferences to the doc [1]. I am generally
>>>>> available any evening Mon - Thu. How about we meet Monday 10th May?
>>>>> 
>>>>> Stamatis, Jesus, Given the complexity of this work, I would very much
>>>>> appreciate your insight, as experts in optimizer theory. Could one of
>>>>> you join the next meeting? Of course we should choose a time that
>>>>> works for everyone's schedule.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>>> 
>>>>> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> We didn't record it, we will try to record the following meetings.
>>>> Please
>>>>>> add your time preference in the docs, so that we can find a meeting
>>>> time
>>>>>> that works for more people.
>>>>>> 
>>>>>> Thanks,
>>>>>> Botong
>>>>>> 
>>>>>> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
>> viliam@hazelcast.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Is there a recording available?
>>>>>>> Viliam
>>>>>>> 
>>>>>>> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> The meeting yesterday was fun and productive. As discussed, this
>>>> is
>>>>> the
>>>>>>>> call to schedule our second meeting.
>>>>>>>> 
>>>>>>>> We encourage everyone to add their time preferences during
>> 05/01 -
>>>>> 05/15
>>>>>>>> here:
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Botong
>>>>>>>> 
>>>>>>>> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> We've created a zoom meeting below for our meeting next Monday
>>>>>>>>> (9pm-10:30pm PST on 04/26).
>>>>>>>>> Talk to you all soon!
>>>>>>>>> 
>>>>>>>>> Join Zoom Meeting
>>>>>>>>> https://uci.zoom.us/j/91279732686
>>>>>>>>> <
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Meeting ID: 912 7973 2686
>>>>>>>>> One tap mobile
>>>>>>>>> +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
>>>>>>>>> +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
>>>>>>>>> 
>>>>>>>>> Dial by your location
>>>>>>>>> +1 669 900 6833 <(669)%20900-6833> US (San Jose)
>>>>>>>>> +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
>>>>>>>>> +1 346 248 7799 <(346)%20248-7799> US (Houston)
>>>>>>>>> +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
>>>>>>>>> +1 312 626 6799 <(312)%20626-6799> US (Chicago)
>>>>>>>>> +1 646 558 8656 <(646)%20558-8656> US (New York)
>>>>>>>>> Meeting ID: 912 7973 2686
>>>>>>>>> Find your local number: https://uci.zoom.us/u/aykHTkJBh
>>>>>>>>> <
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Join by Skype for Business
>>>>>>>>> https://uci.zoom.us/skype/91279732686
>>>>>>>>> <
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Botong
>>>>>>>>> 
>>>>>>>>> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
>> pkuhbt@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> According to the preferences collected, we are tentatively
>>>>> scheduling
>>>>>>>> our
>>>>>>>>>> meeting at 9pm-10:30pm PST on 04/26 Monday.
>>>>>>>>>> 
>>>>>>>>>> We will give a presentation about Tempura, followed by a free
>>>>>>>> discussion.
>>>>>>>>>> 
>>>>>>>>>> Please let us know if there are new other requests. Few days
>>>>> before
>>>>>>>>>> the meeting, I will send out a zoom meeting link.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Botong
>>>>>>>>>> 
>>>>>>>>>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
>> pkuhbt@gmail.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Julian and all,
>>>>>>>>>>> 
>>>>>>>>>>> We've posted the Tempura code base below. Feel free to take
>> a
>>>>> quick
>>>>>>>> peek
>>>>>>>>>>> at the last five commits.
>>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>>>>>>>>>>> 
>>>>>>>>>>> I've also opened a Jira (CALCITE-4568
>>>>>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
>> which
>>>>> will
>>>>>>>> serve
>>>>>>>>>>> as the umbrella Jira for the feature.
>>>>>>>>>>> 
>>>>>>>>>>> In the meantime, we encourage everyone to enter the time
>>>>> preferences
>>>>>>>> for
>>>>>>>>>>> our first meeting here:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Botong
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
>>>>> jhyde.apache@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I have added my time preferences to the doc.
>>>>>>>>>>>> 
>>>>>>>>>>>> Before we meet, could you publish a PR for us to review?
>>>>>>>>>>>> 
>>>>>>>>>>>> Initial discussions will need to be about architecture and
>>>>>>> high-level
>>>>>>>>>>>> design. So I would ask Calcite reviewers not to review the
>> PR
>>>>>>>> line-by-line
>>>>>>>>>>>> (or to leave comments in GitHub) but try to understand the
>>>>> design
>>>>>>>>>>>> holistically, and prepare questions/comments before the
>>>> meeting.
>>>>>>>>>>>> 
>>>>>>>>>>>> Botong, Can you please create a Calcite JIRA case for this
>>>> task?
>>>>>>> JIRA
>>>>>>>>>>>> how we track long-running tasks such as this.
>>>>>>>>>>>> 
>>>>>>>>>>>> Julian
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Apr 3, 2021, at 5:15 PM, Botong Huang <
>> pkuhbt@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Apology for the delay. It took us some time to clean up
>> our
>>>>> code
>>>>>>>> base
>>>>>>>>>>>> and
>>>>>>>>>>>>> publicly release it (which will be out soon) for a quick
>>>> peek.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We are ready to present our work. Let's schedule a time
>>>> for a
>>>>> Zoom
>>>>>>>>>>>>> meeting and discuss how to integrate Tempura into
>> Calcite.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Since some of our team members are in China, we prefer
>> the
>>>>> time
>>>>>>> slot
>>>>>>>>>>>> of
>>>>>>>>>>>>> 7:00pm-11:30pm PST any day. I've added our time
>> preference
>>>> in
>>>>> the
>>>>>>>>>>>> shared
>>>>>>>>>>>>> doc below.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We encourage everyone to add their time preferences
>> (during
>>>>>>>>>>>> 04/15-04/30) in
>>>>>>>>>>>>> this doc. In a week or so, we will try to settle a time
>>>> that
>>>>> works
>>>>>>>> for
>>>>>>>>>>>>> most.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Botong
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
>>>>> pkuhbt@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Julian and Rui,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sounds good to us. Please give us some time to prepare
>>>> some
>>>>>>> slides
>>>>>>>>>>>> for the
>>>>>>>>>>>>>> meeting.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I've created a doc below for discussion. Please feel
>> free
>>>> to
>>>>> add
>>>>>>>>>>>> more in
>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Botong
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
>>>>>>>> jhyde.apache@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> PS The “editable doc” that Rui refers to is also a good
>>>>> idea. I
>>>>>>>>>>>> think we
>>>>>>>>>>>>>>> should create it to continue discussion after the first
>>>>> meeting.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
>>>>>>>> jhyde.apache@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think good next steps would be a PR and a meeting.
>>>> The
>>>>> PR
>>>>>>> will
>>>>>>>>>>>> allow
>>>>>>>>>>>>>>> us to read the code, but I think we should do the first
>>>>> round of
>>>>>>>>>>>> questions
>>>>>>>>>>>>>>> at the meeting.  The meeting could perhaps start with a
>>>>>>>>>>>> presentation of the
>>>>>>>>>>>>>>> paper (do you have some slides you are planning to
>>>> present
>>>>> at
>>>>>>>> VLDB,
>>>>>>>>>>>>>>> Botong?) and then move on to questions about the
>>>> concepts,
>>>>> which
>>>>>>>>>>>>>>> alternatives were considered, and how the concepts map
>>>> onto
>>>>>>> other
>>>>>>>>>>>> current
>>>>>>>>>>>>>>> and future concepts in calcite.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I don’t think we should start “reviewing” the PR
>>>>> line-by-line
>>>>>>> at
>>>>>>>>>>>> this
>>>>>>>>>>>>>>> point. We need to understand the high-level concepts
>> and
>>>>> design
>>>>>>>>>>>> choices. If
>>>>>>>>>>>>>>> we start reviewing the PR we will get lost in the
>>>> details.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I know that integrating a major change is hard; I
>> doubt
>>>>> that we
>>>>>>>>>>>> will be
>>>>>>>>>>>>>>> able to integrate everything, but we can build
>>>> understanding
>>>>>>> about
>>>>>>>>>>>> where
>>>>>>>>>>>>>>> calcite needs to go, and I hope integrate a good amount
>>>> of
>>>>> code
>>>>>>> to
>>>>>>>>>>>> help us
>>>>>>>>>>>>>>> get there.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> As I said before, after the integration I would like
>>>>> people to
>>>>>>> be
>>>>>>>>>>>> able
>>>>>>>>>>>>>>> to experiment with it and use it in their production
>>>>> systems.
>>>>>>>> That
>>>>>>>>>>>> way, it
>>>>>>>>>>>>>>> will not be an experiment that withers, but a feature
>> set
>>>>>>>>>>>> integrates with
>>>>>>>>>>>>>>> other calcite features and gets stronger over time.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
>>>>> amaliujia@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For me to participate in the discussion for the
>> above
>>>>>>>> questions,
>>>>>>>>>>>> I
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> need to read a lot more to know relevant context and
>>>>> likely
>>>>>>> ask
>>>>>>>>>>>> lots of
>>>>>>>>>>>>>>>>> questions :-).  A editable doc is probably good for
>>>>> questions
>>>>>>>> and
>>>>>>>>>>>> back
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> forward discussion.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -Rui
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
>>>>>>>> amaliujia@apache.org
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I am also happy to help push this work into Calcite
>>>>> (review
>>>>>>>> code
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> doc,
>>>>>>>>>>>>>>>>>> etc.).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> While you can share your code so people can have
>> more
>>>>> idea
>>>>>>> how
>>>>>>>>>>>> it is
>>>>>>>>>>>>>>>>>> implemented, I think it would be also nice to have a
>>>> doc
>>>>> to
>>>>>>>>>>>> discuss
>>>>>>>>>>>>>>> open
>>>>>>>>>>>>>>>>>> questions above. Some points that I copy those to
>>>> here:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 1. Can this solution be compatible with existing
>>>>> solutions in
>>>>>>>>>>>> Calcite
>>>>>>>>>>>>>>>>>> Streaming, materialized view maintenance, and
>>>> multi-query
>>>>>>>>>>>> optimization
>>>>>>>>>>>>>>>>>> (Sigma and Delta relational operators, lattice, and
>>>> Spool
>>>>>>>>>>>> operator),
>>>>>>>>>>>>>>>>>> 2. Did you find that you needed two separate cost
>>>> models
>>>>> -
>>>>>>> one
>>>>>>>>>>>> for
>>>>>>>>>>>>>>> “view
>>>>>>>>>>>>>>>>>> maintenance” and another for “user queries” - since
>>>> the
>>>>>>>>>>>> objectives of
>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>> activity are so different?
>>>>>>>>>>>>>>>>>> 3. whether this work will hasten the arrival of
>>>>>>> multi-objective
>>>>>>>>>>>>>>> parametric
>>>>>>>>>>>>>>>>>> query optimization [1] in Calcite.
>>>>>>>>>>>>>>>>>> 4. probably SQL shell support.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -Rui
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
>>>>> zinking3@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> it would be very nice to see a POC of your work.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
>>>>>>>>>>>> pkuhbt@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Julian,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Just wondering if there are any updates? We are
>>>>> wondering
>>>>>>> if
>>>>>>>> it
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> help
>>>>>>>>>>>>>>>>>>>> to post our code for a quick preview.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Botong
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
>>>>>>>> pkuhbt@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Julian,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks for your interest! Sure let's figure out a
>>>> plan
>>>>>>> that
>>>>>>>>>>>> best
>>>>>>>>>>>>>>>>>>> benefits
>>>>>>>>>>>>>>>>>>>>> the community. Here are some clarifications that
>>>>> hopefully
>>>>>>>>>>>> answer
>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>>> questions.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> In our work (Tempura), users specify the set of
>>>> time
>>>>>>> points
>>>>>>>> to
>>>>>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>>>> running and a cost function that expresses users'
>>>>>>> preference
>>>>>>>>>>>> over
>>>>>>>>>>>>>>>>>>> time,
>>>>>>>>>>>>>>>>>>>>> Tempura will generate the best incremental plan
>>>> that
>>>>>>>>>>>> minimizes the
>>>>>>>>>>>>>>>>>>>> overall
>>>>>>>>>>>>>>>>>>>>> cost function.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> In this incremental plan, the sub-plans at
>>>> different
>>>>> time
>>>>>>>>>>>> points
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> different from each other, as opposed to
>> identical
>>>>> plans
>>>>>>> in
>>>>>>>>>>>> all
>>>>>>>>>>>>>>> delta
>>>>>>>>>>>>>>>>>>>> runs
>>>>>>>>>>>>>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
>> the
>>>>>>> Tempura
>>>>>>>>>>>> paper,
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> mimic the current streaming implementation by
>>>>> specifying
>>>>>>> two
>>>>>>>>>>>>>>> (logical)
>>>>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>> points in Tempura, representing the initial run
>> and
>>>>> later
>>>>>>>>>>>> delta
>>>>>>>>>>>>>>> runs
>>>>>>>>>>>>>>>>>>>>> respectively. In general, note that Tempura
>>>> supports
>>>>>>> various
>>>>>>>>>>>> form
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> incremental computing, not only the small-delta
>>>>>>> append-only
>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>> model in
>>>>>>>>>>>>>>>>>>>>> streaming systems. That's why we believe Tempura
>>>>> subsumes
>>>>>>>> the
>>>>>>>>>>>>>>> current
>>>>>>>>>>>>>>>>>>>>> streaming support, as well as any IVM
>>>> implementations.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> About the cost model, we did not come up with a
>>>>> seperate
>>>>>>>> cost
>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> rather extended the existing one. Similar to
>>>>>>> multi-objective
>>>>>>>>>>>>>>>>>>>> optimization,
>>>>>>>>>>>>>>>>>>>>> costs incurred at different time points are
>>>> considered
>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>> dimensions. Tempura lets users supply a function
>>>> that
>>>>>>>>>>>> converts this
>>>>>>>>>>>>>>>>>>> cost
>>>>>>>>>>>>>>>>>>>>> vector into a final cost. So under this function,
>>>> any
>>>>> two
>>>>>>>>>>>>>>> incremental
>>>>>>>>>>>>>>>>>>>> plans
>>>>>>>>>>>>>>>>>>>>> are still comparable and there is an overall
>>>> optimum.
>>>>> I
>>>>>>>> guess
>>>>>>>>>>>> we
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> go
>>>>>>>>>>>>>>>>>>>>> down the route of multi-objective parametric
>> query
>>>>>>>>>>>> optimization
>>>>>>>>>>>>>>>>>>> instead
>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>> there is a need.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Next on materialized views and multi-query
>>>>> optimization,
>>>>>>>>>>>> since our
>>>>>>>>>>>>>>>>>>>>> multi-time-point plan naturally involves
>>>> materializing
>>>>>>>>>>>> intermediate
>>>>>>>>>>>>>>>>>>>> results
>>>>>>>>>>>>>>>>>>>>> for later time points, we need to solve the
>>>> problem of
>>>>>>>>>>>> choosing
>>>>>>>>>>>>>>>>>>>>> materializations and include the cost of saving
>> and
>>>>>>> reusing
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> materializations when costing and comparing
>> plans.
>>>> We
>>>>>>>>>>>> borrowed the
>>>>>>>>>>>>>>>>>>>>> multi-query optimization techniques to solve this
>>>>> problem
>>>>>>>> even
>>>>>>>>>>>>>>> though
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> are looking at a single query. As a result, we
>>>> think
>>>>> our
>>>>>>>> work
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> orthogonal
>>>>>>>>>>>>>>>>>>>>> to Calcite's facilities around utilizing existing
>>>>> views,
>>>>>>>>>>>> lattice
>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>> feel that the multi-query optimization component
>>>> can
>>>>> be
>>>>>>>>>>>> adopted to
>>>>>>>>>>>>>>>>>>> wider
>>>>>>>>>>>>>>>>>>>>> use, but probably need more suggestions from the
>>>>>>> community.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Lastly, our current implementation is set up in
>>>> java
>>>>> code,
>>>>>>>> it
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> straightforward to hook it up with SQL shell.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Botong
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>>>>>>>>>>>>>>> jhyde.apache@gmail.com>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Botong,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> This is very exciting; congratulations on this
>>>>> research,
>>>>>>>> and
>>>>>>>>>>>> thank
>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>> for contributing it back to Calcite.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> The research touches several areas in Calcite:
>>>>> streaming,
>>>>>>>>>>>>>>>>>>> materialized
>>>>>>>>>>>>>>>>>>>>>> view maintenance, and multi-query optimization.
>>>> As we
>>>>>>> have
>>>>>>>>>>>> already
>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>> solutions in those areas (Sigma and Delta
>>>> relational
>>>>>>>>>>>> operators,
>>>>>>>>>>>>>>>>>>> lattice,
>>>>>>>>>>>>>>>>>>>>>> and Spool operator), it will be interesting to
>> see
>>>>>>> whether
>>>>>>>>>>>> we can
>>>>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>>>>>> compatible, or whether one concept can subsume
>>>>> others.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Your work differs from streaming queries in that
>>>> your
>>>>>>>>>>>> relations
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>> by “external” user queries, whereas in pure
>>>> streaming
>>>>>>>>>>>> queries, the
>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>> activity is the change propagation. Did you find
>>>>> that you
>>>>>>>>>>>> needed
>>>>>>>>>>>>>>> two
>>>>>>>>>>>>>>>>>>>>>> separate cost models - one for “view
>> maintenance”
>>>> and
>>>>>>>>>>>> another for
>>>>>>>>>>>>>>>>>>> “user
>>>>>>>>>>>>>>>>>>>>>> queries” - since the objectives of each activity
>>>> are
>>>>> so
>>>>>>>>>>>> different?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I wonder whether this work will hasten the
>>>> arrival of
>>>>>>>>>>>>>>> multi-objective
>>>>>>>>>>>>>>>>>>>>>> parametric query optimization [1] in Calcite.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I will make time over the next few days to read
>>>> and
>>>>>>> digest
>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>> paper.
>>>>>>>>>>>>>>>>>>>>>> Then I expect that we will have a back-and-forth
>>>>> process
>>>>>>> to
>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>>> something that will be useful for the broader
>>>>> community.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> One thing will be particularly useful: making
>> this
>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>>>>>>>> available from a SQL shell, so that people can
>>>>> experiment
>>>>>>>>>>>> with
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> functionality without writing Java code or
>>>> setting up
>>>>>>>> complex
>>>>>>>>>>>>>>>>>>> databases
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> metadata. I have in mind something like the
>> simple
>>>>> DDL
>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>> available in Calcite’s ’server’ module. I wonder
>>>>> whether
>>>>>>> we
>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>> devise
>>>>>>>>>>>>>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Julian
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
>>>>>>>> pkuhbt@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thanks Aron for pointing this out. To see the
>>>>> figure,
>>>>>>>> please
>>>>>>>>>>>>>>> refer
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> Fig
>>>>>>>>>>>>>>>>>>>>>>> 3(a) in our paper:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>> Botong
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>>>>>>>>>>>> taojiatao@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Seems interesting, the pic can not be seen in
>>>> the
>>>>> mail,
>>>>>>>>>>>> may you
>>>>>>>>>>>>>>>>>>> open
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> JIRA
>>>>>>>>>>>>>>>>>>>>>>>> for this, people who are interested in this
>> can
>>>>>>> subscribe
>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>>> JIRA?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Regards!
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Aron Tao
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Botong Huang <bo...@apache.org>
>> 于2020年12月24日周四
>>>>>>>> 上午3:18写道:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> This is a proposal to extend the Calcite
>>>> optimizer
>>>>>>> into
>>>>>>>> a
>>>>>>>>>>>>>>> general
>>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer, based on our
>>>> research
>>>>>>> paper
>>>>>>>>>>>>>>>>>>> published
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> VLDB
>>>>>>>>>>>>>>>>>>>>>>>>> 2021:
>>>>>>>>>>>>>>>>>>>>>>>>> Tempura: a general cost-based optimizer
>>>> framework
>>>>> for
>>>>>>>>>>>>>>> incremental
>>>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>>> processing
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> We also have a demo in SIGMOD 2020
>> illustrating
>>>>> how
>>>>>>>>>>>> Alibaba’s
>>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>>>>>> warehouse is planning to use this incremental
>>>>> query
>>>>>>>>>>>> optimizer
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> alleviate
>>>>>>>>>>>>>>>>>>>>>>>>> cluster-wise resource skewness:
>>>>>>>>>>>>>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
>>>>> Resource-Aware
>>>>>>>>>>>>>>> Incremental
>>>>>>>>>>>>>>>>>>>>>>>> Computing
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> To our best knowledge, this is the first
>>>> general
>>>>>>>>>>>> cost-based
>>>>>>>>>>>>>>>>>>>>>> incremental
>>>>>>>>>>>>>>>>>>>>>>>>> optimizer that can find the best plan across
>>>>> multiple
>>>>>>>>>>>> families
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>> incremental computing methods, including IVM,
>>>>>>> Streaming,
>>>>>>>>>>>>>>>>>>> DBToaster,
>>>>>>>>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>>>> Experiments (in the paper) shows that the
>>>>> generated
>>>>>>> best
>>>>>>>>>>>> plan
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> consistently much better than the plans from
>>>> each
>>>>>>>>>>>> individual
>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>> alone.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> In general, incremental query planning is
>>>> central
>>>>> to
>>>>>>>>>>>> database
>>>>>>>>>>>>>>>>>>> view
>>>>>>>>>>>>>>>>>>>>>>>>> maintenance and stream processing systems,
>> and
>>>> are
>>>>>>> being
>>>>>>>>>>>>>>> adopted
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> active
>>>>>>>>>>>>>>>>>>>>>>>>> databases, resumable query execution,
>>>> approximate
>>>>>>> query
>>>>>>>>>>>>>>>>>>> processing,
>>>>>>>>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>>>>>>> are hoping that this feature can help
>> widening
>>>> the
>>>>>>>>>>>> spectrum of
>>>>>>>>>>>>>>>>>>>>>> Calcite,
>>>>>>>>>>>>>>>>>>>>>>>>> solicit more use cases and adoption of
>> Calcite.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Below is a brief description of the technical
>>>>> details.
>>>>>>>>>>>> Please
>>>>>>>>>>>>>>>>>>> refer
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> Tempura paper for more details. We are also
>>>>> working
>>>>>>> on a
>>>>>>>>>>>>>>> journal
>>>>>>>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>> the paper with more implementation details.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Currently the query plan generated by Calcite
>>>> is
>>>>> meant
>>>>>>>> to
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>> executed
>>>>>>>>>>>>>>>>>>>>>>>>> altogether at once. In the proposal,
>> Calcite’s
>>>>> memo
>>>>>>> will
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> extended
>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>> temporal information so that it is capable of
>>>>>>> generating
>>>>>>>>>>>>>>>>>>> incremental
>>>>>>>>>>>>>>>>>>>>>>>> plans
>>>>>>>>>>>>>>>>>>>>>>>>> that include multiple sub-plans to execute at
>>>>>>> different
>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>> points.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> The main idea is to view each table as one
>> that
>>>>>>> changes
>>>>>>>>>>>> over
>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>>>>>>> (Time
>>>>>>>>>>>>>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
>>>>>>> introduced
>>>>>>>>>>>>>>>>>>> TvrMetaSet
>>>>>>>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
>> to
>>>>> track
>>>>>>>>>>>> related
>>>>>>>>>>>>>>>>>>> RelSets
>>>>>>>>>>>>>>>>>>>>>> of a
>>>>>>>>>>>>>>>>>>>>>>>>> changing table (e.g. snapshot of the table at
>>>>> certain
>>>>>>>>>>>> time,
>>>>>>>>>>>>>>>>>>> delta of
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> table between two time points, etc.).
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> For example in the above figure, each
>> vertical
>>>>> line
>>>>>>> is a
>>>>>>>>>>>>>>>>>>> TvrMetaSet
>>>>>>>>>>>>>>>>>>>>>>>>> representing a TVR (S, R, S left outer join
>> R,
>>>>> etc.).
>>>>>>>>>>>>>>> Horizontal
>>>>>>>>>>>>>>>>>>>> lines
>>>>>>>>>>>>>>>>>>>>>>>>> represent time. Each black dot in the grid
>> is a
>>>>>>> RelSet.
>>>>>>>>>>>> Users
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>> write
>>>>>>>>>>>>>>>>>>>>>>>> TVR
>>>>>>>>>>>>>>>>>>>>>>>>> Rewrite Rules to describe valid
>> transformations
>>>>>>> between
>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>> dots.
>>>>>>>>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>>>>>>> example, the blues lines are inter-TVR rules
>>>> that
>>>>>>>>>>>> describe how
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> compute
>>>>>>>>>>>>>>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
>>>>> TVRs.
>>>>>>> The
>>>>>>>>>>>> red
>>>>>>>>>>>>>>> lines
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>> intra-TVR rules that describe transformations
>>>>> within a
>>>>>>>>>>>> TVR. All
>>>>>>>>>>>>>>>>>>> TVR
>>>>>>>>>>>>>>>>>>>>>>>> rewrite
>>>>>>>>>>>>>>>>>>>>>>>>> rules are logical rules. All existing Calcite
>>>>> rules
>>>>>>>> still
>>>>>>>>>>>> work
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>> volcano system without modification.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> All changes in this feature will consist of
>>>> four
>>>>>>> parts:
>>>>>>>>>>>>>>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>>>>>>>>>>>>>>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
>>>>> TvrMetaSet
>>>>>>>> and
>>>>>>>>>>>>>>>>>>> RelNodes,
>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>> well as links in between the nodes.
>>>>>>>>>>>>>>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
>>>>> upgraded
>>>>>>>>>>>> rule
>>>>>>>>>>>>>>>>>>> engine
>>>>>>>>>>>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>>>>>>>>> 4. Multi-query optimization, used to find the
>>>> best
>>>>>>>>>>>> incremental
>>>>>>>>>>>>>>>>>>> plan
>>>>>>>>>>>>>>>>>>>>>>>>> involving multiple time points.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Note that this feature is an extension in
>>>> nature
>>>>> and
>>>>>>>> thus
>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>> disabled,
>>>>>>>>>>>>>>>>>>>>>>>>> does not change any existing Calcite
>> behavior.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Other than scenarios in the paper, we also
>>>> applied
>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> Calcite-extended
>>>>>>>>>>>>>>>>>>>>>>>>> incremental query optimizer to a type of
>>>> periodic
>>>>>>> query
>>>>>>>>>>>> called
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> ‘‘range
>>>>>>>>>>>>>>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
>>>> achieved
>>>>> cost
>>>>>>>>>>>> savings
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> 80%
>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>> total CPU and memory consumption, and 60% on
>>>>>>> end-to-end
>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> All comments and suggestions are welcome.
>>>> Thanks
>>>>> and
>>>>>>>> happy
>>>>>>>>>>>>>>>>>>> holidays!
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> Botong
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>>>>>> no mistakes
>>>>>>>>>>>>>>>>>>> ~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Viliam Durina
>>>>>>> Jet Developer
>>>>>>>      hazelcast®
>>>>>>> 
>>>>>>>  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo,
>> CA
>>>>> 94402 |
>>>>>>> USA
>>>>>>> +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
>> https://www.hazelcast.com>
>>>>>>> 
>>>>>>> --
>>>>>>> This message contains confidential information and is intended
>> only
>>>> for
>>>>>>> the
>>>>>>> individuals named. If you are not the named addressee you should
>> not
>>>>>>> disseminate, distribute or copy this e-mail. Please notify the
>>>> sender
>>>>>>> immediately by e-mail if you have received this e-mail by mistake
>>>> and
>>>>>>> delete this e-mail from your system. E-mail transmission cannot be
>>>>>>> guaranteed to be secure or error-free as information could be
>>>>> intercepted,
>>>>>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>>>>> viruses.
>>>>>>> The sender therefore does not accept liability for any errors or
>>>>> omissions
>>>>>>> in the contents of this message, which arise as a result of e-mail
>>>>>>> transmission. If verification is required, please request a
>>>> hard-copy
>>>>>>> version. -Hazelcast
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Rui Wang <am...@apache.org>.
I apologize that I had a wrong impression on the meeting time (I thought it
should be on Thursday but it is Wednesday). I can follow up your meeting
records if you have any.


-Rui

On Tue, May 11, 2021 at 8:17 PM Botong Huang <pk...@gmail.com> wrote:

> Hi all,
>
> This is a reminder that we are going to have our second discussion meeting
> tomorrow at 10-11pm PST. Please find the link below, everyone is welcome to
> join!
>
> Join Zoom Meeting
> https://uci.zoom.us/j/91986206610
> <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn
> >
>
> Meeting ID: 919 8620 6610
> One tap mobile
> +16699006833 <(669)%20900-6833>,,91986206610# US (San Jose)
> +12532158782 <(253)%20215-8782>,,91986206610# US (Tacoma)
>
> Dial by your location
>         +1 669 900 6833 <(669)%20900-6833> US (San Jose)
>         +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
>         +1 346 248 7799 <(346)%20248-7799> US (Houston)
>         +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
>         +1 312 626 6799 <(312)%20626-6799> US (Chicago)
>         +1 646 558 8656 <(646)%20558-8656> US (New York)
> Meeting ID: 919 8620 6610
> Find your local number: https://uci.zoom.us/u/acyXcc43Cd
> <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6
> >
>
> Join by Skype for Business
> https://uci.zoom.us/skype/91986206610
> <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z
> >
>
> Thanks,
> Botong
>
> On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:
>
> > Hi Stamatis and all,
> >
> > Thanks for the interest! Let's tentatively schedule the next meeting next
> > Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's
> new
> > needs showing up.
> >
> > Best,
> > Botong
> >
> > On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> I really regret missing the first meeting, sorry about that. I added my
> >> preferences in the document.
> >> I will make sure to attend the next one and help as much as I can.
> >>
> >> I didn't have the chance yet to go over the paper but will try to do it
> >> before the next meeting.
> >>
> >> For me the following dates are more convenient than others so it would
> be
> >> nice if we could arrange it then.
> >>
> >> Thu, May 6, 10pm PST
> >> Tue, May 12, 10pm PST
> >>
> >> Best,
> >> Stamatis
> >>
> >> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
> >>
> >> > I have added my time preferences to the doc [1]. I am generally
> >> > available any evening Mon - Thu. How about we meet Monday 10th May?
> >> >
> >> > Stamatis, Jesus, Given the complexity of this work, I would very much
> >> > appreciate your insight, as experts in optimizer theory. Could one of
> >> > you join the next meeting? Of course we should choose a time that
> >> > works for everyone's schedule.
> >> >
> >> > Julian
> >> >
> >> > [1]
> >> >
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >> >
> >> > On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com>
> wrote:
> >> > >
> >> > > We didn't record it, we will try to record the following meetings.
> >> Please
> >> > > add your time preference in the docs, so that we can find a meeting
> >> time
> >> > > that works for more people.
> >> > >
> >> > > Thanks,
> >> > > Botong
> >> > >
> >> > > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <
> viliam@hazelcast.com>
> >> > wrote:
> >> > >
> >> > > > Is there a recording available?
> >> > > > Viliam
> >> > > >
> >> > > > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > The meeting yesterday was fun and productive. As discussed, this
> >> is
> >> > the
> >> > > > > call to schedule our second meeting.
> >> > > > >
> >> > > > > We encourage everyone to add their time preferences during
> 05/01 -
> >> > 05/15
> >> > > > > here:
> >> > > > >
> >> > > > >
> >> > > >
> >> >
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Botong
> >> > > > >
> >> > > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Hi all,
> >> > > > > > We've created a zoom meeting below for our meeting next Monday
> >> > > > > > (9pm-10:30pm PST on 04/26).
> >> > > > > > Talk to you all soon!
> >> > > > > >
> >> > > > > > Join Zoom Meeting
> >> > > > > > https://uci.zoom.us/j/91279732686
> >> > > > > > <
> >> > > > >
> >> > > >
> >> >
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> >> > > > > >
> >> > > > > >
> >> > > > > > Meeting ID: 912 7973 2686
> >> > > > > > One tap mobile
> >> > > > > > +16699006833 <(669)%20900-6833>,,91279732686# US (San Jose)
> >> > > > > > +12532158782 <(253)%20215-8782>,,91279732686# US (Tacoma)
> >> > > > > >
> >> > > > > > Dial by your location
> >> > > > > > +1 669 900 6833 <(669)%20900-6833> US (San Jose)
> >> > > > > > +1 253 215 8782 <(253)%20215-8782> US (Tacoma)
> >> > > > > > +1 346 248 7799 <(346)%20248-7799> US (Houston)
> >> > > > > > +1 301 715 8592 <(301)%20715-8592> US (Washington DC)
> >> > > > > > +1 312 626 6799 <(312)%20626-6799> US (Chicago)
> >> > > > > > +1 646 558 8656 <(646)%20558-8656> US (New York)
> >> > > > > > Meeting ID: 912 7973 2686
> >> > > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> >> > > > > > <
> >> > > > >
> >> > > >
> >> >
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> >> > > > > >
> >> > > > > >
> >> > > > > > Join by Skype for Business
> >> > > > > > https://uci.zoom.us/skype/91279732686
> >> > > > > > <
> >> > > > >
> >> > > >
> >> >
> >>
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Botong
> >> > > > > >
> >> > > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <
> pkuhbt@gmail.com
> >> >
> >> > > > wrote:
> >> > > > > >
> >> > > > > >> Hi all,
> >> > > > > >>
> >> > > > > >> According to the preferences collected, we are tentatively
> >> > scheduling
> >> > > > > our
> >> > > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> >> > > > > >>
> >> > > > > >> We will give a presentation about Tempura, followed by a free
> >> > > > > discussion.
> >> > > > > >>
> >> > > > > >> Please let us know if there are new other requests. Few days
> >> > before
> >> > > > > >> the meeting, I will send out a zoom meeting link.
> >> > > > > >>
> >> > > > > >> Thanks,
> >> > > > > >> Botong
> >> > > > > >>
> >> > > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <
> pkuhbt@gmail.com>
> >> > wrote:
> >> > > > > >>
> >> > > > > >>> Hi Julian and all,
> >> > > > > >>>
> >> > > > > >>> We've posted the Tempura code base below. Feel free to take
> a
> >> > quick
> >> > > > > peek
> >> > > > > >>> at the last five commits.
> >> > > > > >>>
> >> > > > >
> >> >
> >>
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> >> > > > > >>>
> >> > > > > >>> I've also opened a Jira (CALCITE-4568
> >> > > > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>),
> which
> >> > will
> >> > > > > serve
> >> > > > > >>> as the umbrella Jira for the feature.
> >> > > > > >>>
> >> > > > > >>> In the meantime, we encourage everyone to enter the time
> >> > preferences
> >> > > > > for
> >> > > > > >>> our first meeting here:
> >> > > > > >>>
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> >
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >> > > > > >>>
> >> > > > > >>> Thanks,
> >> > > > > >>> Botong
> >> > > > > >>>
> >> > > > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> >> > jhyde.apache@gmail.com>
> >> > > > > >>> wrote:
> >> > > > > >>>
> >> > > > > >>>> I have added my time preferences to the doc.
> >> > > > > >>>>
> >> > > > > >>>> Before we meet, could you publish a PR for us to review?
> >> > > > > >>>>
> >> > > > > >>>> Initial discussions will need to be about architecture and
> >> > > > high-level
> >> > > > > >>>> design. So I would ask Calcite reviewers not to review the
> PR
> >> > > > > line-by-line
> >> > > > > >>>> (or to leave comments in GitHub) but try to understand the
> >> > design
> >> > > > > >>>> holistically, and prepare questions/comments before the
> >> meeting.
> >> > > > > >>>>
> >> > > > > >>>> Botong, Can you please create a Calcite JIRA case for this
> >> task?
> >> > > > JIRA
> >> > > > > >>>> how we track long-running tasks such as this.
> >> > > > > >>>>
> >> > > > > >>>> Julian
> >> > > > > >>>>
> >> > > > > >>>>
> >> > > > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <
> pkuhbt@gmail.com
> >> >
> >> > > > wrote:
> >> > > > > >>>> >
> >> > > > > >>>> > Hi all,
> >> > > > > >>>> >
> >> > > > > >>>> > Apology for the delay. It took us some time to clean up
> our
> >> > code
> >> > > > > base
> >> > > > > >>>> and
> >> > > > > >>>> > publicly release it (which will be out soon) for a quick
> >> peek.
> >> > > > > >>>> >
> >> > > > > >>>> > We are ready to present our work. Let's schedule a time
> >> for a
> >> > Zoom
> >> > > > > >>>> > meeting and discuss how to integrate Tempura into
> Calcite.
> >> > > > > >>>> >
> >> > > > > >>>> > Since some of our team members are in China, we prefer
> the
> >> > time
> >> > > > slot
> >> > > > > >>>> of
> >> > > > > >>>> > 7:00pm-11:30pm PST any day. I've added our time
> preference
> >> in
> >> > the
> >> > > > > >>>> shared
> >> > > > > >>>> > doc below.
> >> > > > > >>>> >
> >> > > > > >>>>
> >> > > > >
> >> > > >
> >> >
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >> > > > > >>>> >
> >> > > > > >>>> > We encourage everyone to add their time preferences
> (during
> >> > > > > >>>> 04/15-04/30) in
> >> > > > > >>>> > this doc. In a week or so, we will try to settle a time
> >> that
> >> > works
> >> > > > > for
> >> > > > > >>>> > most.
> >> > > > > >>>> >
> >> > > > > >>>> > Thanks,
> >> > > > > >>>> > Botong
> >> > > > > >>>> >
> >> > > > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> >> > pkuhbt@gmail.com>
> >> > > > > >>>> wrote:
> >> > > > > >>>> >
> >> > > > > >>>> >> Hi Julian and Rui,
> >> > > > > >>>> >>
> >> > > > > >>>> >> Sounds good to us. Please give us some time to prepare
> >> some
> >> > > > slides
> >> > > > > >>>> for the
> >> > > > > >>>> >> meeting.
> >> > > > > >>>> >>
> >> > > > > >>>> >> I've created a doc below for discussion. Please feel
> free
> >> to
> >> > add
> >> > > > > >>>> more in
> >> > > > > >>>> >> here:
> >> > > > > >>>> >>
> >> > > > > >>>> >>
> >> > > > > >>>>
> >> > > > >
> >> > > >
> >> >
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >> > > > > >>>> >>
> >> > > > > >>>> >> Thanks,
> >> > > > > >>>> >> Botong
> >> > > > > >>>> >>
> >> > > > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> >> > > > > jhyde.apache@gmail.com
> >> > > > > >>>> >
> >> > > > > >>>> >> wrote:
> >> > > > > >>>> >>
> >> > > > > >>>> >>> PS The “editable doc” that Rui refers to is also a good
> >> > idea. I
> >> > > > > >>>> think we
> >> > > > > >>>> >>> should create it to continue discussion after the first
> >> > meeting.
> >> > > > > >>>> >>>
> >> > > > > >>>> >>> Julian
> >> > > > > >>>> >>>
> >> > > > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> >> > > > > jhyde.apache@gmail.com>
> >> > > > > >>>> >>> wrote:
> >> > > > > >>>> >>>>
> >> > > > > >>>> >>>> I think good next steps would be a PR and a meeting.
> >> The
> >> > PR
> >> > > > will
> >> > > > > >>>> allow
> >> > > > > >>>> >>> us to read the code, but I think we should do the first
> >> > round of
> >> > > > > >>>> questions
> >> > > > > >>>> >>> at the meeting.  The meeting could perhaps start with a
> >> > > > > >>>> presentation of the
> >> > > > > >>>> >>> paper (do you have some slides you are planning to
> >> present
> >> > at
> >> > > > > VLDB,
> >> > > > > >>>> >>> Botong?) and then move on to questions about the
> >> concepts,
> >> > which
> >> > > > > >>>> >>> alternatives were considered, and how the concepts map
> >> onto
> >> > > > other
> >> > > > > >>>> current
> >> > > > > >>>> >>> and future concepts in calcite.
> >> > > > > >>>> >>>>
> >> > > > > >>>> >>>> I don’t think we should start “reviewing” the PR
> >> > line-by-line
> >> > > > at
> >> > > > > >>>> this
> >> > > > > >>>> >>> point. We need to understand the high-level concepts
> and
> >> > design
> >> > > > > >>>> choices. If
> >> > > > > >>>> >>> we start reviewing the PR we will get lost in the
> >> details.
> >> > > > > >>>> >>>>
> >> > > > > >>>> >>>> I know that integrating a major change is hard; I
> doubt
> >> > that we
> >> > > > > >>>> will be
> >> > > > > >>>> >>> able to integrate everything, but we can build
> >> understanding
> >> > > > about
> >> > > > > >>>> where
> >> > > > > >>>> >>> calcite needs to go, and I hope integrate a good amount
> >> of
> >> > code
> >> > > > to
> >> > > > > >>>> help us
> >> > > > > >>>> >>> get there.
> >> > > > > >>>> >>>>
> >> > > > > >>>> >>>> As I said before, after the integration I would like
> >> > people to
> >> > > > be
> >> > > > > >>>> able
> >> > > > > >>>> >>> to experiment with it and use it in their production
> >> > systems.
> >> > > > > That
> >> > > > > >>>> way, it
> >> > > > > >>>> >>> will not be an experiment that withers, but a feature
> set
> >> > > > > >>>> integrates with
> >> > > > > >>>> >>> other calcite features and gets stronger over time.
> >> > > > > >>>> >>>>
> >> > > > > >>>> >>>> Julian
> >> > > > > >>>> >>>>
> >> > > > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> >> > amaliujia@apache.org>
> >> > > > > >>>> wrote:
> >> > > > > >>>> >>>>>
> >> > > > > >>>> >>>>> For me to participate in the discussion for the
> above
> >> > > > > questions,
> >> > > > > >>>> I
> >> > > > > >>>> >>> will
> >> > > > > >>>> >>>>> need to read a lot more to know relevant context and
> >> > likely
> >> > > > ask
> >> > > > > >>>> lots of
> >> > > > > >>>> >>>>> questions :-).  A editable doc is probably good for
> >> > questions
> >> > > > > and
> >> > > > > >>>> back
> >> > > > > >>>> >>> and
> >> > > > > >>>> >>>>> forward discussion.
> >> > > > > >>>> >>>>>
> >> > > > > >>>> >>>>>
> >> > > > > >>>> >>>>> -Rui
> >> > > > > >>>> >>>>>
> >> > > > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> >> > > > > amaliujia@apache.org
> >> > > > > >>>> >
> >> > > > > >>>> >>> wrote:
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>> I am also happy to help push this work into Calcite
> >> > (review
> >> > > > > code
> >> > > > > >>>> and
> >> > > > > >>>> >>> doc,
> >> > > > > >>>> >>>>>> etc.).
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>> While you can share your code so people can have
> more
> >> > idea
> >> > > > how
> >> > > > > >>>> it is
> >> > > > > >>>> >>>>>> implemented, I think it would be also nice to have a
> >> doc
> >> > to
> >> > > > > >>>> discuss
> >> > > > > >>>> >>> open
> >> > > > > >>>> >>>>>> questions above. Some points that I copy those to
> >> here:
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>> 1. Can this solution be compatible with existing
> >> > solutions in
> >> > > > > >>>> Calcite
> >> > > > > >>>> >>>>>> Streaming, materialized view maintenance, and
> >> multi-query
> >> > > > > >>>> optimization
> >> > > > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and
> >> Spool
> >> > > > > >>>> operator),
> >> > > > > >>>> >>>>>> 2. Did you find that you needed two separate cost
> >> models
> >> > -
> >> > > > one
> >> > > > > >>>> for
> >> > > > > >>>> >>> “view
> >> > > > > >>>> >>>>>> maintenance” and another for “user queries” - since
> >> the
> >> > > > > >>>> objectives of
> >> > > > > >>>> >>> each
> >> > > > > >>>> >>>>>> activity are so different?
> >> > > > > >>>> >>>>>> 3. whether this work will hasten the arrival of
> >> > > > multi-objective
> >> > > > > >>>> >>> parametric
> >> > > > > >>>> >>>>>> query optimization [1] in Calcite.
> >> > > > > >>>> >>>>>> 4. probably SQL shell support.
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>> [1]:
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>
> >> > > > > >>>>
> >> > > > >
> >> > > >
> >> >
> >>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>> -Rui
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> >> > zinking3@gmail.com>
> >> > > > > >>>> wrote:
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>>>>> it would be very nice to see a POC of your work.
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> >> > > > > >>>> pkuhbt@gmail.com>
> >> > > > > >>>> >>> wrote:
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>>>>>> Hi Julian,
> >> > > > > >>>> >>>>>>>>
> >> > > > > >>>> >>>>>>>> Just wondering if there are any updates? We are
> >> > wondering
> >> > > > if
> >> > > > > it
> >> > > > > >>>> >>> would
> >> > > > > >>>> >>>>>>> help
> >> > > > > >>>> >>>>>>>> to post our code for a quick preview.
> >> > > > > >>>> >>>>>>>>
> >> > > > > >>>> >>>>>>>> Thanks,
> >> > > > > >>>> >>>>>>>> Botong
> >> > > > > >>>> >>>>>>>>
> >> > > > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> >> > > > > pkuhbt@gmail.com
> >> > > > > >>>> >
> >> > > > > >>>> >>> wrote:
> >> > > > > >>>> >>>>>>>>
> >> > > > > >>>> >>>>>>>>> Hi Julian,
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a
> >> plan
> >> > > > that
> >> > > > > >>>> best
> >> > > > > >>>> >>>>>>> benefits
> >> > > > > >>>> >>>>>>>>> the community. Here are some clarifications that
> >> > hopefully
> >> > > > > >>>> answer
> >> > > > > >>>> >>> your
> >> > > > > >>>> >>>>>>>>> questions.
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of
> >> time
> >> > > > points
> >> > > > > to
> >> > > > > >>>> >>>>>>> consider
> >> > > > > >>>> >>>>>>>>> running and a cost function that expresses users'
> >> > > > preference
> >> > > > > >>>> over
> >> > > > > >>>> >>>>>>> time,
> >> > > > > >>>> >>>>>>>>> Tempura will generate the best incremental plan
> >> that
> >> > > > > >>>> minimizes the
> >> > > > > >>>> >>>>>>>> overall
> >> > > > > >>>> >>>>>>>>> cost function.
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at
> >> different
> >> > time
> >> > > > > >>>> points
> >> > > > > >>>> >>> can
> >> > > > > >>>> >>>>>>> be
> >> > > > > >>>> >>>>>>>>> different from each other, as opposed to
> identical
> >> > plans
> >> > > > in
> >> > > > > >>>> all
> >> > > > > >>>> >>> delta
> >> > > > > >>>> >>>>>>>> runs
> >> > > > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of
> the
> >> > > > Tempura
> >> > > > > >>>> paper,
> >> > > > > >>>> >>> we
> >> > > > > >>>> >>>>>>> can
> >> > > > > >>>> >>>>>>>>> mimic the current streaming implementation by
> >> > specifying
> >> > > > two
> >> > > > > >>>> >>> (logical)
> >> > > > > >>>> >>>>>>>> time
> >> > > > > >>>> >>>>>>>>> points in Tempura, representing the initial run
> and
> >> > later
> >> > > > > >>>> delta
> >> > > > > >>>> >>> runs
> >> > > > > >>>> >>>>>>>>> respectively. In general, note that Tempura
> >> supports
> >> > > > various
> >> > > > > >>>> form
> >> > > > > >>>> >>> of
> >> > > > > >>>> >>>>>>>>> incremental computing, not only the small-delta
> >> > > > append-only
> >> > > > > >>>> data
> >> > > > > >>>> >>>>>>> model in
> >> > > > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura
> >> > subsumes
> >> > > > > the
> >> > > > > >>>> >>> current
> >> > > > > >>>> >>>>>>>>> streaming support, as well as any IVM
> >> implementations.
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> About the cost model, we did not come up with a
> >> > seperate
> >> > > > > cost
> >> > > > > >>>> >>> model,
> >> > > > > >>>> >>>>>>> but
> >> > > > > >>>> >>>>>>>>> rather extended the existing one. Similar to
> >> > > > multi-objective
> >> > > > > >>>> >>>>>>>> optimization,
> >> > > > > >>>> >>>>>>>>> costs incurred at different time points are
> >> considered
> >> > > > > >>>> different
> >> > > > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function
> >> that
> >> > > > > >>>> converts this
> >> > > > > >>>> >>>>>>> cost
> >> > > > > >>>> >>>>>>>>> vector into a final cost. So under this function,
> >> any
> >> > two
> >> > > > > >>>> >>> incremental
> >> > > > > >>>> >>>>>>>> plans
> >> > > > > >>>> >>>>>>>>> are still comparable and there is an overall
> >> optimum.
> >> > I
> >> > > > > guess
> >> > > > > >>>> we
> >> > > > > >>>> >>> can
> >> > > > > >>>> >>>>>>> go
> >> > > > > >>>> >>>>>>>>> down the route of multi-objective parametric
> query
> >> > > > > >>>> optimization
> >> > > > > >>>> >>>>>>> instead
> >> > > > > >>>> >>>>>>>> if
> >> > > > > >>>> >>>>>>>>> there is a need.
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> Next on materialized views and multi-query
> >> > optimization,
> >> > > > > >>>> since our
> >> > > > > >>>> >>>>>>>>> multi-time-point plan naturally involves
> >> materializing
> >> > > > > >>>> intermediate
> >> > > > > >>>> >>>>>>>> results
> >> > > > > >>>> >>>>>>>>> for later time points, we need to solve the
> >> problem of
> >> > > > > >>>> choosing
> >> > > > > >>>> >>>>>>>>> materializations and include the cost of saving
> and
> >> > > > reusing
> >> > > > > >>>> the
> >> > > > > >>>> >>>>>>>>> materializations when costing and comparing
> plans.
> >> We
> >> > > > > >>>> borrowed the
> >> > > > > >>>> >>>>>>>>> multi-query optimization techniques to solve this
> >> > problem
> >> > > > > even
> >> > > > > >>>> >>> though
> >> > > > > >>>> >>>>>>> we
> >> > > > > >>>> >>>>>>>>> are looking at a single query. As a result, we
> >> think
> >> > our
> >> > > > > work
> >> > > > > >>>> is
> >> > > > > >>>> >>>>>>>> orthogonal
> >> > > > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing
> >> > views,
> >> > > > > >>>> lattice
> >> > > > > >>>> >>> etc.
> >> > > > > >>>> >>>>>>> We
> >> > > > > >>>> >>>>>>>> do
> >> > > > > >>>> >>>>>>>>> feel that the multi-query optimization component
> >> can
> >> > be
> >> > > > > >>>> adopted to
> >> > > > > >>>> >>>>>>> wider
> >> > > > > >>>> >>>>>>>>> use, but probably need more suggestions from the
> >> > > > community.
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> Lastly, our current implementation is set up in
> >> java
> >> > code,
> >> > > > > it
> >> > > > > >>>> >>> should
> >> > > > > >>>> >>>>>>> be
> >> > > > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> Thanks,
> >> > > > > >>>> >>>>>>>>> Botong
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> >> > > > > >>>> >>> jhyde.apache@gmail.com>
> >> > > > > >>>> >>>>>>>>> wrote:
> >> > > > > >>>> >>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> Botong,
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> This is very exciting; congratulations on this
> >> > research,
> >> > > > > and
> >> > > > > >>>> thank
> >> > > > > >>>> >>>>>>> you
> >> > > > > >>>> >>>>>>>>>> for contributing it back to Calcite.
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> The research touches several areas in Calcite:
> >> > streaming,
> >> > > > > >>>> >>>>>>> materialized
> >> > > > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization.
> >> As we
> >> > > > have
> >> > > > > >>>> already
> >> > > > > >>>> >>>>>>> some
> >> > > > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta
> >> relational
> >> > > > > >>>> operators,
> >> > > > > >>>> >>>>>>> lattice,
> >> > > > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to
> see
> >> > > > whether
> >> > > > > >>>> we can
> >> > > > > >>>> >>>>>>> make
> >> > > > > >>>> >>>>>>>> them
> >> > > > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume
> >> > others.
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> Your work differs from streaming queries in that
> >> your
> >> > > > > >>>> relations
> >> > > > > >>>> >>> are
> >> > > > > >>>> >>>>>>> used
> >> > > > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure
> >> streaming
> >> > > > > >>>> queries, the
> >> > > > > >>>> >>>>>>> only
> >> > > > > >>>> >>>>>>>>>> activity is the change propagation. Did you find
> >> > that you
> >> > > > > >>>> needed
> >> > > > > >>>> >>> two
> >> > > > > >>>> >>>>>>>>>> separate cost models - one for “view
> maintenance”
> >> and
> >> > > > > >>>> another for
> >> > > > > >>>> >>>>>>> “user
> >> > > > > >>>> >>>>>>>>>> queries” - since the objectives of each activity
> >> are
> >> > so
> >> > > > > >>>> different?
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> I wonder whether this work will hasten the
> >> arrival of
> >> > > > > >>>> >>> multi-objective
> >> > > > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> I will make time over the next few days to read
> >> and
> >> > > > digest
> >> > > > > >>>> your
> >> > > > > >>>> >>>>>>> paper.
> >> > > > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth
> >> > process
> >> > > > to
> >> > > > > >>>> create
> >> > > > > >>>> >>>>>>>>>> something that will be useful for the broader
> >> > community.
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> One thing will be particularly useful: making
> this
> >> > > > > >>>> functionality
> >> > > > > >>>> >>>>>>>>>> available from a SQL shell, so that people can
> >> > experiment
> >> > > > > >>>> with
> >> > > > > >>>> >>> this
> >> > > > > >>>> >>>>>>>>>> functionality without writing Java code or
> >> setting up
> >> > > > > complex
> >> > > > > >>>> >>>>>>> databases
> >> > > > > >>>> >>>>>>>> and
> >> > > > > >>>> >>>>>>>>>> metadata. I have in mind something like the
> simple
> >> > DDL
> >> > > > > >>>> operations
> >> > > > > >>>> >>>>>>> that
> >> > > > > >>>> >>>>>>>> are
> >> > > > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> >> > whether
> >> > > > we
> >> > > > > >>>> could
> >> > > > > >>>> >>>>>>> devise
> >> > > > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> Julian
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>> [1]
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>
> >> > > > > >>>>
> >> > > > >
> >> > > >
> >> >
> >>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> >> > > > > pkuhbt@gmail.com
> >> > > > > >>>> >
> >> > > > > >>>> >>>>>>> wrote:
> >> > > > > >>>> >>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the
> >> > figure,
> >> > > > > please
> >> > > > > >>>> >>> refer
> >> > > > > >>>> >>>>>>> to
> >> > > > > >>>> >>>>>>>>>> Fig
> >> > > > > >>>> >>>>>>>>>>> 3(a) in our paper:
> >> > > > > >>>> >>>>>>>>>>
> >> > https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> >> > > > > >>>> >>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>> Best,
> >> > > > > >>>> >>>>>>>>>>> Botong
> >> > > > > >>>> >>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> >> > > > > >>>> taojiatao@gmail.com>
> >> > > > > >>>> >>>>>>>> wrote:
> >> > > > > >>>> >>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in
> >> the
> >> > mail,
> >> > > > > >>>> may you
> >> > > > > >>>> >>>>>>> open
> >> > > > > >>>> >>>>>>>> a
> >> > > > > >>>> >>>>>>>>>> JIRA
> >> > > > > >>>> >>>>>>>>>>>> for this, people who are interested in this
> can
> >> > > > subscribe
> >> > > > > >>>> to the
> >> > > > > >>>> >>>>>>>> JIRA?
> >> > > > > >>>> >>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>> Regards!
> >> > > > > >>>> >>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>> Aron Tao
> >> > > > > >>>> >>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org>
> 于2020年12月24日周四
> >> > > > > 上午3:18写道:
> >> > > > > >>>> >>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> Hi all,
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite
> >> optimizer
> >> > > > into
> >> > > > > a
> >> > > > > >>>> >>> general
> >> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our
> >> research
> >> > > > paper
> >> > > > > >>>> >>>>>>> published
> >> > > > > >>>> >>>>>>>> in
> >> > > > > >>>> >>>>>>>>>>>> VLDB
> >> > > > > >>>> >>>>>>>>>>>>> 2021:
> >> > > > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer
> >> framework
> >> > for
> >> > > > > >>>> >>> incremental
> >> > > > > >>>> >>>>>>>> data
> >> > > > > >>>> >>>>>>>>>>>>> processing
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020
> illustrating
> >> > how
> >> > > > > >>>> Alibaba’s
> >> > > > > >>>> >>>>>>> data
> >> > > > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental
> >> > query
> >> > > > > >>>> optimizer
> >> > > > > >>>> >>> to
> >> > > > > >>>> >>>>>>>>>>>> alleviate
> >> > > > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> >> > > > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> >> > Resource-Aware
> >> > > > > >>>> >>> Incremental
> >> > > > > >>>> >>>>>>>>>>>> Computing
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first
> >> general
> >> > > > > >>>> cost-based
> >> > > > > >>>> >>>>>>>>>> incremental
> >> > > > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across
> >> > multiple
> >> > > > > >>>> families
> >> > > > > >>>> >>> of
> >> > > > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
> >> > > > Streaming,
> >> > > > > >>>> >>>>>>> DBToaster,
> >> > > > > >>>> >>>>>>>>>> etc.
> >> > > > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the
> >> > generated
> >> > > > best
> >> > > > > >>>> plan
> >> > > > > >>>> >>> is
> >> > > > > >>>> >>>>>>>>>>>>> consistently much better than the plans from
> >> each
> >> > > > > >>>> individual
> >> > > > > >>>> >>>>>>> method
> >> > > > > >>>> >>>>>>>>>>>> alone.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> In general, incremental query planning is
> >> central
> >> > to
> >> > > > > >>>> database
> >> > > > > >>>> >>>>>>> view
> >> > > > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems,
> and
> >> are
> >> > > > being
> >> > > > > >>>> >>> adopted
> >> > > > > >>>> >>>>>>> in
> >> > > > > >>>> >>>>>>>>>>>> active
> >> > > > > >>>> >>>>>>>>>>>>> databases, resumable query execution,
> >> approximate
> >> > > > query
> >> > > > > >>>> >>>>>>> processing,
> >> > > > > >>>> >>>>>>>>>> etc.
> >> > > > > >>>> >>>>>>>>>>>> We
> >> > > > > >>>> >>>>>>>>>>>>> are hoping that this feature can help
> widening
> >> the
> >> > > > > >>>> spectrum of
> >> > > > > >>>> >>>>>>>>>> Calcite,
> >> > > > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of
> Calcite.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical
> >> > details.
> >> > > > > >>>> Please
> >> > > > > >>>> >>>>>>> refer
> >> > > > > >>>> >>>>>>>> to
> >> > > > > >>>> >>>>>>>>>>>> the
> >> > > > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also
> >> > working
> >> > > > on a
> >> > > > > >>>> >>> journal
> >> > > > > >>>> >>>>>>>>>> version
> >> > > > > >>>> >>>>>>>>>>>> of
> >> > > > > >>>> >>>>>>>>>>>>> the paper with more implementation details.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite
> >> is
> >> > meant
> >> > > > > to
> >> > > > > >>>> be
> >> > > > > >>>> >>>>>>>> executed
> >> > > > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal,
> Calcite’s
> >> > memo
> >> > > > will
> >> > > > > >>>> be
> >> > > > > >>>> >>>>>>> extended
> >> > > > > >>>> >>>>>>>>>> with
> >> > > > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
> >> > > > generating
> >> > > > > >>>> >>>>>>> incremental
> >> > > > > >>>> >>>>>>>>>>>> plans
> >> > > > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
> >> > > > different
> >> > > > > >>>> time
> >> > > > > >>>> >>>>>>> points.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one
> that
> >> > > > changes
> >> > > > > >>>> over
> >> > > > > >>>> >>> time
> >> > > > > >>>> >>>>>>>>>> (Time
> >> > > > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> >> > > > introduced
> >> > > > > >>>> >>>>>>> TvrMetaSet
> >> > > > > >>>> >>>>>>>>>> into
> >> > > > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset
> to
> >> > track
> >> > > > > >>>> related
> >> > > > > >>>> >>>>>>> RelSets
> >> > > > > >>>> >>>>>>>>>> of a
> >> > > > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> >> > certain
> >> > > > > >>>> time,
> >> > > > > >>>> >>>>>>> delta of
> >> > > > > >>>> >>>>>>>>>> the
> >> > > > > >>>> >>>>>>>>>>>>> table between two time points, etc.).
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> [image: image.png]
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> For example in the above figure, each
> vertical
> >> > line
> >> > > > is a
> >> > > > > >>>> >>>>>>> TvrMetaSet
> >> > > > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join
> R,
> >> > etc.).
> >> > > > > >>>> >>> Horizontal
> >> > > > > >>>> >>>>>>>> lines
> >> > > > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid
> is a
> >> > > > RelSet.
> >> > > > > >>>> Users
> >> > > > > >>>> >>> can
> >> > > > > >>>> >>>>>>>>>> write
> >> > > > > >>>> >>>>>>>>>>>> TVR
> >> > > > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid
> transformations
> >> > > > between
> >> > > > > >>>> these
> >> > > > > >>>> >>>>>>> dots.
> >> > > > > >>>> >>>>>>>>>> For
> >> > > > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> >> that
> >> > > > > >>>> describe how
> >> > > > > >>>> >>> to
> >> > > > > >>>> >>>>>>>>>> compute
> >> > > > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
> >> > TVRs.
> >> > > > The
> >> > > > > >>>> red
> >> > > > > >>>> >>> lines
> >> > > > > >>>> >>>>>>>> are
> >> > > > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations
> >> > within a
> >> > > > > >>>> TVR. All
> >> > > > > >>>> >>>>>>> TVR
> >> > > > > >>>> >>>>>>>>>>>> rewrite
> >> > > > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite
> >> > rules
> >> > > > > still
> >> > > > > >>>> work
> >> > > > > >>>> >>> in
> >> > > > > >>>> >>>>>>>> the
> >> > > > > >>>> >>>>>>>>>> new
> >> > > > > >>>> >>>>>>>>>>>>> volcano system without modification.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of
> >> four
> >> > > > parts:
> >> > > > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> >> > > > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
> >> > TvrMetaSet
> >> > > > > and
> >> > > > > >>>> >>>>>>> RelNodes,
> >> > > > > >>>> >>>>>>>>>> as
> >> > > > > >>>> >>>>>>>>>>>>> well as links in between the nodes.
> >> > > > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
> >> > upgraded
> >> > > > > >>>> rule
> >> > > > > >>>> >>>>>>> engine
> >> > > > > >>>> >>>>>>>>>> API.
> >> > > > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the
> >> best
> >> > > > > >>>> incremental
> >> > > > > >>>> >>>>>>> plan
> >> > > > > >>>> >>>>>>>>>>>>> involving multiple time points.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in
> >> nature
> >> > and
> >> > > > > thus
> >> > > > > >>>> when
> >> > > > > >>>> >>>>>>>>>> disabled,
> >> > > > > >>>> >>>>>>>>>>>>> does not change any existing Calcite
> behavior.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also
> >> applied
> >> > > > this
> >> > > > > >>>> >>>>>>>>>> Calcite-extended
> >> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of
> >> periodic
> >> > > > query
> >> > > > > >>>> called
> >> > > > > >>>> >>>>>>> the
> >> > > > > >>>> >>>>>>>>>>>> ‘‘range
> >> > > > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
> >> achieved
> >> > cost
> >> > > > > >>>> savings
> >> > > > > >>>> >>> of
> >> > > > > >>>> >>>>>>> 80%
> >> > > > > >>>> >>>>>>>>>> on
> >> > > > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> >> > > > end-to-end
> >> > > > > >>>> >>> execution
> >> > > > > >>>> >>>>>>>>>> time.
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome.
> >> Thanks
> >> > and
> >> > > > > happy
> >> > > > > >>>> >>>>>>> holidays!
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>> Best,
> >> > > > > >>>> >>>>>>>>>>>>> Botong
> >> > > > > >>>> >>>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>>>
> >> > > > > >>>> >>>>>>>>
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>>>>> --
> >> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~
> >> > > > > >>>> >>>>>>> no mistakes
> >> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> >> > > > > >>>> >>>>>>>
> >> > > > > >>>> >>>>>>
> >> > > > > >>>> >>>
> >> > > > > >>>> >>
> >> > > > > >>>>
> >> > > > > >>>>
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Viliam Durina
> >> > > > Jet Developer
> >> > > >       hazelcast®
> >> > > >
> >> > > >   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo,
> CA
> >> > 94402 |
> >> > > > USA
> >> > > > +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
> https://www.hazelcast.com>
> >> > > >
> >> > > > --
> >> > > > This message contains confidential information and is intended
> only
> >> for
> >> > > > the
> >> > > > individuals named. If you are not the named addressee you should
> not
> >> > > > disseminate, distribute or copy this e-mail. Please notify the
> >> sender
> >> > > > immediately by e-mail if you have received this e-mail by mistake
> >> and
> >> > > > delete this e-mail from your system. E-mail transmission cannot be
> >> > > > guaranteed to be secure or error-free as information could be
> >> > intercepted,
> >> > > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> >> > viruses.
> >> > > > The sender therefore does not accept liability for any errors or
> >> > omissions
> >> > > > in the contents of this message, which arise as a result of e-mail
> >> > > > transmission. If verification is required, please request a
> >> hard-copy
> >> > > > version. -Hazelcast
> >> > > >
> >> >
> >>
> >
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi all,

This is a reminder that we are going to have our second discussion meeting
tomorrow at 10-11pm PST. Please find the link below, everyone is welcome to
join!

Join Zoom Meeting
https://uci.zoom.us/j/91986206610
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw24sxPtI6hbukCSo3nlQsbn>

Meeting ID: 919 8620 6610
One tap mobile
+16699006833,,91986206610# US (San Jose)
+12532158782,,91986206610# US (Tacoma)

Dial by your location
        +1 669 900 6833 US (San Jose)
        +1 253 215 8782 US (Tacoma)
        +1 346 248 7799 US (Houston)
        +1 301 715 8592 US (Washington DC)
        +1 312 626 6799 US (Chicago)
        +1 646 558 8656 US (New York)
Meeting ID: 919 8620 6610
Find your local number: https://uci.zoom.us/u/acyXcc43Cd
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FacyXcc43Cd&sa=D&source=calendar&usd=2&usg=AOvVaw2W08kj_8hEx44dryeZlXb6>

Join by Skype for Business
https://uci.zoom.us/skype/91986206610
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91986206610&sa=D&source=calendar&usd=2&usg=AOvVaw3w0M0YYbcjPyBXzNpyyk0Z>

Thanks,
Botong

On Wed, May 5, 2021 at 9:55 AM Botong Huang <pk...@gmail.com> wrote:

> Hi Stamatis and all,
>
> Thanks for the interest! Let's tentatively schedule the next meeting next
> Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's new
> needs showing up.
>
> Best,
> Botong
>
> On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I really regret missing the first meeting, sorry about that. I added my
>> preferences in the document.
>> I will make sure to attend the next one and help as much as I can.
>>
>> I didn't have the chance yet to go over the paper but will try to do it
>> before the next meeting.
>>
>> For me the following dates are more convenient than others so it would be
>> nice if we could arrange it then.
>>
>> Thu, May 6, 10pm PST
>> Tue, May 12, 10pm PST
>>
>> Best,
>> Stamatis
>>
>> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
>>
>> > I have added my time preferences to the doc [1]. I am generally
>> > available any evening Mon - Thu. How about we meet Monday 10th May?
>> >
>> > Stamatis, Jesus, Given the complexity of this work, I would very much
>> > appreciate your insight, as experts in optimizer theory. Could one of
>> > you join the next meeting? Of course we should choose a time that
>> > works for everyone's schedule.
>> >
>> > Julian
>> >
>> > [1]
>> >
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >
>> > On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com> wrote:
>> > >
>> > > We didn't record it, we will try to record the following meetings.
>> Please
>> > > add your time preference in the docs, so that we can find a meeting
>> time
>> > > that works for more people.
>> > >
>> > > Thanks,
>> > > Botong
>> > >
>> > > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vi...@hazelcast.com>
>> > wrote:
>> > >
>> > > > Is there a recording available?
>> > > > Viliam
>> > > >
>> > > > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > The meeting yesterday was fun and productive. As discussed, this
>> is
>> > the
>> > > > > call to schedule our second meeting.
>> > > > >
>> > > > > We encourage everyone to add their time preferences during 05/01 -
>> > 05/15
>> > > > > here:
>> > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> > > > >
>> > > > > Thanks,
>> > > > > Botong
>> > > > >
>> > > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > > We've created a zoom meeting below for our meeting next Monday
>> > > > > > (9pm-10:30pm PST on 04/26).
>> > > > > > Talk to you all soon!
>> > > > > >
>> > > > > > Join Zoom Meeting
>> > > > > > https://uci.zoom.us/j/91279732686
>> > > > > > <
>> > > > >
>> > > >
>> >
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
>> > > > > >
>> > > > > >
>> > > > > > Meeting ID: 912 7973 2686
>> > > > > > One tap mobile
>> > > > > > +16699006833,,91279732686# US (San Jose)
>> > > > > > +12532158782,,91279732686# US (Tacoma)
>> > > > > >
>> > > > > > Dial by your location
>> > > > > > +1 669 900 6833 US (San Jose)
>> > > > > > +1 253 215 8782 US (Tacoma)
>> > > > > > +1 346 248 7799 US (Houston)
>> > > > > > +1 301 715 8592 US (Washington DC)
>> > > > > > +1 312 626 6799 US (Chicago)
>> > > > > > +1 646 558 8656 US (New York)
>> > > > > > Meeting ID: 912 7973 2686
>> > > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
>> > > > > > <
>> > > > >
>> > > >
>> >
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
>> > > > > >
>> > > > > >
>> > > > > > Join by Skype for Business
>> > > > > > https://uci.zoom.us/skype/91279732686
>> > > > > > <
>> > > > >
>> > > >
>> >
>> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Botong
>> > > > > >
>> > > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pkuhbt@gmail.com
>> >
>> > > > wrote:
>> > > > > >
>> > > > > >> Hi all,
>> > > > > >>
>> > > > > >> According to the preferences collected, we are tentatively
>> > scheduling
>> > > > > our
>> > > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
>> > > > > >>
>> > > > > >> We will give a presentation about Tempura, followed by a free
>> > > > > discussion.
>> > > > > >>
>> > > > > >> Please let us know if there are new other requests. Few days
>> > before
>> > > > > >> the meeting, I will send out a zoom meeting link.
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> Botong
>> > > > > >>
>> > > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com>
>> > wrote:
>> > > > > >>
>> > > > > >>> Hi Julian and all,
>> > > > > >>>
>> > > > > >>> We've posted the Tempura code base below. Feel free to take a
>> > quick
>> > > > > peek
>> > > > > >>> at the last five commits.
>> > > > > >>>
>> > > > >
>> >
>> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>> > > > > >>>
>> > > > > >>> I've also opened a Jira (CALCITE-4568
>> > > > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which
>> > will
>> > > > > serve
>> > > > > >>> as the umbrella Jira for the feature.
>> > > > > >>>
>> > > > > >>> In the meantime, we encourage everyone to enter the time
>> > preferences
>> > > > > for
>> > > > > >>> our first meeting here:
>> > > > > >>>
>> > > > > >>>
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> > > > > >>>
>> > > > > >>> Thanks,
>> > > > > >>> Botong
>> > > > > >>>
>> > > > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
>> > jhyde.apache@gmail.com>
>> > > > > >>> wrote:
>> > > > > >>>
>> > > > > >>>> I have added my time preferences to the doc.
>> > > > > >>>>
>> > > > > >>>> Before we meet, could you publish a PR for us to review?
>> > > > > >>>>
>> > > > > >>>> Initial discussions will need to be about architecture and
>> > > > high-level
>> > > > > >>>> design. So I would ask Calcite reviewers not to review the PR
>> > > > > line-by-line
>> > > > > >>>> (or to leave comments in GitHub) but try to understand the
>> > design
>> > > > > >>>> holistically, and prepare questions/comments before the
>> meeting.
>> > > > > >>>>
>> > > > > >>>> Botong, Can you please create a Calcite JIRA case for this
>> task?
>> > > > JIRA
>> > > > > >>>> how we track long-running tasks such as this.
>> > > > > >>>>
>> > > > > >>>> Julian
>> > > > > >>>>
>> > > > > >>>>
>> > > > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pkuhbt@gmail.com
>> >
>> > > > wrote:
>> > > > > >>>> >
>> > > > > >>>> > Hi all,
>> > > > > >>>> >
>> > > > > >>>> > Apology for the delay. It took us some time to clean up our
>> > code
>> > > > > base
>> > > > > >>>> and
>> > > > > >>>> > publicly release it (which will be out soon) for a quick
>> peek.
>> > > > > >>>> >
>> > > > > >>>> > We are ready to present our work. Let's schedule a time
>> for a
>> > Zoom
>> > > > > >>>> > meeting and discuss how to integrate Tempura into Calcite.
>> > > > > >>>> >
>> > > > > >>>> > Since some of our team members are in China, we prefer the
>> > time
>> > > > slot
>> > > > > >>>> of
>> > > > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference
>> in
>> > the
>> > > > > >>>> shared
>> > > > > >>>> > doc below.
>> > > > > >>>> >
>> > > > > >>>>
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> > > > > >>>> >
>> > > > > >>>> > We encourage everyone to add their time preferences (during
>> > > > > >>>> 04/15-04/30) in
>> > > > > >>>> > this doc. In a week or so, we will try to settle a time
>> that
>> > works
>> > > > > for
>> > > > > >>>> > most.
>> > > > > >>>> >
>> > > > > >>>> > Thanks,
>> > > > > >>>> > Botong
>> > > > > >>>> >
>> > > > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
>> > pkuhbt@gmail.com>
>> > > > > >>>> wrote:
>> > > > > >>>> >
>> > > > > >>>> >> Hi Julian and Rui,
>> > > > > >>>> >>
>> > > > > >>>> >> Sounds good to us. Please give us some time to prepare
>> some
>> > > > slides
>> > > > > >>>> for the
>> > > > > >>>> >> meeting.
>> > > > > >>>> >>
>> > > > > >>>> >> I've created a doc below for discussion. Please feel free
>> to
>> > add
>> > > > > >>>> more in
>> > > > > >>>> >> here:
>> > > > > >>>> >>
>> > > > > >>>> >>
>> > > > > >>>>
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> > > > > >>>> >>
>> > > > > >>>> >> Thanks,
>> > > > > >>>> >> Botong
>> > > > > >>>> >>
>> > > > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
>> > > > > jhyde.apache@gmail.com
>> > > > > >>>> >
>> > > > > >>>> >> wrote:
>> > > > > >>>> >>
>> > > > > >>>> >>> PS The “editable doc” that Rui refers to is also a good
>> > idea. I
>> > > > > >>>> think we
>> > > > > >>>> >>> should create it to continue discussion after the first
>> > meeting.
>> > > > > >>>> >>>
>> > > > > >>>> >>> Julian
>> > > > > >>>> >>>
>> > > > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
>> > > > > jhyde.apache@gmail.com>
>> > > > > >>>> >>> wrote:
>> > > > > >>>> >>>>
>> > > > > >>>> >>>> I think good next steps would be a PR and a meeting.
>> The
>> > PR
>> > > > will
>> > > > > >>>> allow
>> > > > > >>>> >>> us to read the code, but I think we should do the first
>> > round of
>> > > > > >>>> questions
>> > > > > >>>> >>> at the meeting.  The meeting could perhaps start with a
>> > > > > >>>> presentation of the
>> > > > > >>>> >>> paper (do you have some slides you are planning to
>> present
>> > at
>> > > > > VLDB,
>> > > > > >>>> >>> Botong?) and then move on to questions about the
>> concepts,
>> > which
>> > > > > >>>> >>> alternatives were considered, and how the concepts map
>> onto
>> > > > other
>> > > > > >>>> current
>> > > > > >>>> >>> and future concepts in calcite.
>> > > > > >>>> >>>>
>> > > > > >>>> >>>> I don’t think we should start “reviewing” the PR
>> > line-by-line
>> > > > at
>> > > > > >>>> this
>> > > > > >>>> >>> point. We need to understand the high-level concepts and
>> > design
>> > > > > >>>> choices. If
>> > > > > >>>> >>> we start reviewing the PR we will get lost in the
>> details.
>> > > > > >>>> >>>>
>> > > > > >>>> >>>> I know that integrating a major change is hard; I doubt
>> > that we
>> > > > > >>>> will be
>> > > > > >>>> >>> able to integrate everything, but we can build
>> understanding
>> > > > about
>> > > > > >>>> where
>> > > > > >>>> >>> calcite needs to go, and I hope integrate a good amount
>> of
>> > code
>> > > > to
>> > > > > >>>> help us
>> > > > > >>>> >>> get there.
>> > > > > >>>> >>>>
>> > > > > >>>> >>>> As I said before, after the integration I would like
>> > people to
>> > > > be
>> > > > > >>>> able
>> > > > > >>>> >>> to experiment with it and use it in their production
>> > systems.
>> > > > > That
>> > > > > >>>> way, it
>> > > > > >>>> >>> will not be an experiment that withers, but a feature set
>> > > > > >>>> integrates with
>> > > > > >>>> >>> other calcite features and gets stronger over time.
>> > > > > >>>> >>>>
>> > > > > >>>> >>>> Julian
>> > > > > >>>> >>>>
>> > > > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
>> > amaliujia@apache.org>
>> > > > > >>>> wrote:
>> > > > > >>>> >>>>>
>> > > > > >>>> >>>>> For me to participate in the discussion for the above
>> > > > > questions,
>> > > > > >>>> I
>> > > > > >>>> >>> will
>> > > > > >>>> >>>>> need to read a lot more to know relevant context and
>> > likely
>> > > > ask
>> > > > > >>>> lots of
>> > > > > >>>> >>>>> questions :-).  A editable doc is probably good for
>> > questions
>> > > > > and
>> > > > > >>>> back
>> > > > > >>>> >>> and
>> > > > > >>>> >>>>> forward discussion.
>> > > > > >>>> >>>>>
>> > > > > >>>> >>>>>
>> > > > > >>>> >>>>> -Rui
>> > > > > >>>> >>>>>
>> > > > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
>> > > > > amaliujia@apache.org
>> > > > > >>>> >
>> > > > > >>>> >>> wrote:
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>> I am also happy to help push this work into Calcite
>> > (review
>> > > > > code
>> > > > > >>>> and
>> > > > > >>>> >>> doc,
>> > > > > >>>> >>>>>> etc.).
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>> While you can share your code so people can have more
>> > idea
>> > > > how
>> > > > > >>>> it is
>> > > > > >>>> >>>>>> implemented, I think it would be also nice to have a
>> doc
>> > to
>> > > > > >>>> discuss
>> > > > > >>>> >>> open
>> > > > > >>>> >>>>>> questions above. Some points that I copy those to
>> here:
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>> 1. Can this solution be compatible with existing
>> > solutions in
>> > > > > >>>> Calcite
>> > > > > >>>> >>>>>> Streaming, materialized view maintenance, and
>> multi-query
>> > > > > >>>> optimization
>> > > > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and
>> Spool
>> > > > > >>>> operator),
>> > > > > >>>> >>>>>> 2. Did you find that you needed two separate cost
>> models
>> > -
>> > > > one
>> > > > > >>>> for
>> > > > > >>>> >>> “view
>> > > > > >>>> >>>>>> maintenance” and another for “user queries” - since
>> the
>> > > > > >>>> objectives of
>> > > > > >>>> >>> each
>> > > > > >>>> >>>>>> activity are so different?
>> > > > > >>>> >>>>>> 3. whether this work will hasten the arrival of
>> > > > multi-objective
>> > > > > >>>> >>> parametric
>> > > > > >>>> >>>>>> query optimization [1] in Calcite.
>> > > > > >>>> >>>>>> 4. probably SQL shell support.
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>> [1]:
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>
>> > > > > >>>>
>> > > > >
>> > > >
>> >
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>> -Rui
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
>> > zinking3@gmail.com>
>> > > > > >>>> wrote:
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>>>>> it would be very nice to see a POC of your work.
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
>> > > > > >>>> pkuhbt@gmail.com>
>> > > > > >>>> >>> wrote:
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>>>>>> Hi Julian,
>> > > > > >>>> >>>>>>>>
>> > > > > >>>> >>>>>>>> Just wondering if there are any updates? We are
>> > wondering
>> > > > if
>> > > > > it
>> > > > > >>>> >>> would
>> > > > > >>>> >>>>>>> help
>> > > > > >>>> >>>>>>>> to post our code for a quick preview.
>> > > > > >>>> >>>>>>>>
>> > > > > >>>> >>>>>>>> Thanks,
>> > > > > >>>> >>>>>>>> Botong
>> > > > > >>>> >>>>>>>>
>> > > > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
>> > > > > pkuhbt@gmail.com
>> > > > > >>>> >
>> > > > > >>>> >>> wrote:
>> > > > > >>>> >>>>>>>>
>> > > > > >>>> >>>>>>>>> Hi Julian,
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a
>> plan
>> > > > that
>> > > > > >>>> best
>> > > > > >>>> >>>>>>> benefits
>> > > > > >>>> >>>>>>>>> the community. Here are some clarifications that
>> > hopefully
>> > > > > >>>> answer
>> > > > > >>>> >>> your
>> > > > > >>>> >>>>>>>>> questions.
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of
>> time
>> > > > points
>> > > > > to
>> > > > > >>>> >>>>>>> consider
>> > > > > >>>> >>>>>>>>> running and a cost function that expresses users'
>> > > > preference
>> > > > > >>>> over
>> > > > > >>>> >>>>>>> time,
>> > > > > >>>> >>>>>>>>> Tempura will generate the best incremental plan
>> that
>> > > > > >>>> minimizes the
>> > > > > >>>> >>>>>>>> overall
>> > > > > >>>> >>>>>>>>> cost function.
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at
>> different
>> > time
>> > > > > >>>> points
>> > > > > >>>> >>> can
>> > > > > >>>> >>>>>>> be
>> > > > > >>>> >>>>>>>>> different from each other, as opposed to identical
>> > plans
>> > > > in
>> > > > > >>>> all
>> > > > > >>>> >>> delta
>> > > > > >>>> >>>>>>>> runs
>> > > > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the
>> > > > Tempura
>> > > > > >>>> paper,
>> > > > > >>>> >>> we
>> > > > > >>>> >>>>>>> can
>> > > > > >>>> >>>>>>>>> mimic the current streaming implementation by
>> > specifying
>> > > > two
>> > > > > >>>> >>> (logical)
>> > > > > >>>> >>>>>>>> time
>> > > > > >>>> >>>>>>>>> points in Tempura, representing the initial run and
>> > later
>> > > > > >>>> delta
>> > > > > >>>> >>> runs
>> > > > > >>>> >>>>>>>>> respectively. In general, note that Tempura
>> supports
>> > > > various
>> > > > > >>>> form
>> > > > > >>>> >>> of
>> > > > > >>>> >>>>>>>>> incremental computing, not only the small-delta
>> > > > append-only
>> > > > > >>>> data
>> > > > > >>>> >>>>>>> model in
>> > > > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura
>> > subsumes
>> > > > > the
>> > > > > >>>> >>> current
>> > > > > >>>> >>>>>>>>> streaming support, as well as any IVM
>> implementations.
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> About the cost model, we did not come up with a
>> > seperate
>> > > > > cost
>> > > > > >>>> >>> model,
>> > > > > >>>> >>>>>>> but
>> > > > > >>>> >>>>>>>>> rather extended the existing one. Similar to
>> > > > multi-objective
>> > > > > >>>> >>>>>>>> optimization,
>> > > > > >>>> >>>>>>>>> costs incurred at different time points are
>> considered
>> > > > > >>>> different
>> > > > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function
>> that
>> > > > > >>>> converts this
>> > > > > >>>> >>>>>>> cost
>> > > > > >>>> >>>>>>>>> vector into a final cost. So under this function,
>> any
>> > two
>> > > > > >>>> >>> incremental
>> > > > > >>>> >>>>>>>> plans
>> > > > > >>>> >>>>>>>>> are still comparable and there is an overall
>> optimum.
>> > I
>> > > > > guess
>> > > > > >>>> we
>> > > > > >>>> >>> can
>> > > > > >>>> >>>>>>> go
>> > > > > >>>> >>>>>>>>> down the route of multi-objective parametric query
>> > > > > >>>> optimization
>> > > > > >>>> >>>>>>> instead
>> > > > > >>>> >>>>>>>> if
>> > > > > >>>> >>>>>>>>> there is a need.
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> Next on materialized views and multi-query
>> > optimization,
>> > > > > >>>> since our
>> > > > > >>>> >>>>>>>>> multi-time-point plan naturally involves
>> materializing
>> > > > > >>>> intermediate
>> > > > > >>>> >>>>>>>> results
>> > > > > >>>> >>>>>>>>> for later time points, we need to solve the
>> problem of
>> > > > > >>>> choosing
>> > > > > >>>> >>>>>>>>> materializations and include the cost of saving and
>> > > > reusing
>> > > > > >>>> the
>> > > > > >>>> >>>>>>>>> materializations when costing and comparing plans.
>> We
>> > > > > >>>> borrowed the
>> > > > > >>>> >>>>>>>>> multi-query optimization techniques to solve this
>> > problem
>> > > > > even
>> > > > > >>>> >>> though
>> > > > > >>>> >>>>>>> we
>> > > > > >>>> >>>>>>>>> are looking at a single query. As a result, we
>> think
>> > our
>> > > > > work
>> > > > > >>>> is
>> > > > > >>>> >>>>>>>> orthogonal
>> > > > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing
>> > views,
>> > > > > >>>> lattice
>> > > > > >>>> >>> etc.
>> > > > > >>>> >>>>>>> We
>> > > > > >>>> >>>>>>>> do
>> > > > > >>>> >>>>>>>>> feel that the multi-query optimization component
>> can
>> > be
>> > > > > >>>> adopted to
>> > > > > >>>> >>>>>>> wider
>> > > > > >>>> >>>>>>>>> use, but probably need more suggestions from the
>> > > > community.
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> Lastly, our current implementation is set up in
>> java
>> > code,
>> > > > > it
>> > > > > >>>> >>> should
>> > > > > >>>> >>>>>>> be
>> > > > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> Thanks,
>> > > > > >>>> >>>>>>>>> Botong
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>> > > > > >>>> >>> jhyde.apache@gmail.com>
>> > > > > >>>> >>>>>>>>> wrote:
>> > > > > >>>> >>>>>>>>>
>> > > > > >>>> >>>>>>>>>> Botong,
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> This is very exciting; congratulations on this
>> > research,
>> > > > > and
>> > > > > >>>> thank
>> > > > > >>>> >>>>>>> you
>> > > > > >>>> >>>>>>>>>> for contributing it back to Calcite.
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> The research touches several areas in Calcite:
>> > streaming,
>> > > > > >>>> >>>>>>> materialized
>> > > > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization.
>> As we
>> > > > have
>> > > > > >>>> already
>> > > > > >>>> >>>>>>> some
>> > > > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta
>> relational
>> > > > > >>>> operators,
>> > > > > >>>> >>>>>>> lattice,
>> > > > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see
>> > > > whether
>> > > > > >>>> we can
>> > > > > >>>> >>>>>>> make
>> > > > > >>>> >>>>>>>> them
>> > > > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume
>> > others.
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> Your work differs from streaming queries in that
>> your
>> > > > > >>>> relations
>> > > > > >>>> >>> are
>> > > > > >>>> >>>>>>> used
>> > > > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure
>> streaming
>> > > > > >>>> queries, the
>> > > > > >>>> >>>>>>> only
>> > > > > >>>> >>>>>>>>>> activity is the change propagation. Did you find
>> > that you
>> > > > > >>>> needed
>> > > > > >>>> >>> two
>> > > > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance”
>> and
>> > > > > >>>> another for
>> > > > > >>>> >>>>>>> “user
>> > > > > >>>> >>>>>>>>>> queries” - since the objectives of each activity
>> are
>> > so
>> > > > > >>>> different?
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> I wonder whether this work will hasten the
>> arrival of
>> > > > > >>>> >>> multi-objective
>> > > > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> I will make time over the next few days to read
>> and
>> > > > digest
>> > > > > >>>> your
>> > > > > >>>> >>>>>>> paper.
>> > > > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth
>> > process
>> > > > to
>> > > > > >>>> create
>> > > > > >>>> >>>>>>>>>> something that will be useful for the broader
>> > community.
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> One thing will be particularly useful: making this
>> > > > > >>>> functionality
>> > > > > >>>> >>>>>>>>>> available from a SQL shell, so that people can
>> > experiment
>> > > > > >>>> with
>> > > > > >>>> >>> this
>> > > > > >>>> >>>>>>>>>> functionality without writing Java code or
>> setting up
>> > > > > complex
>> > > > > >>>> >>>>>>> databases
>> > > > > >>>> >>>>>>>> and
>> > > > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple
>> > DDL
>> > > > > >>>> operations
>> > > > > >>>> >>>>>>> that
>> > > > > >>>> >>>>>>>> are
>> > > > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder
>> > whether
>> > > > we
>> > > > > >>>> could
>> > > > > >>>> >>>>>>> devise
>> > > > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> Julian
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>> [1]
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>
>> > > > > >>>>
>> > > > >
>> > > >
>> >
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
>> > > > > pkuhbt@gmail.com
>> > > > > >>>> >
>> > > > > >>>> >>>>>>> wrote:
>> > > > > >>>> >>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the
>> > figure,
>> > > > > please
>> > > > > >>>> >>> refer
>> > > > > >>>> >>>>>>> to
>> > > > > >>>> >>>>>>>>>> Fig
>> > > > > >>>> >>>>>>>>>>> 3(a) in our paper:
>> > > > > >>>> >>>>>>>>>>
>> > https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>> > > > > >>>> >>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>> Best,
>> > > > > >>>> >>>>>>>>>>> Botong
>> > > > > >>>> >>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>> > > > > >>>> taojiatao@gmail.com>
>> > > > > >>>> >>>>>>>> wrote:
>> > > > > >>>> >>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in
>> the
>> > mail,
>> > > > > >>>> may you
>> > > > > >>>> >>>>>>> open
>> > > > > >>>> >>>>>>>> a
>> > > > > >>>> >>>>>>>>>> JIRA
>> > > > > >>>> >>>>>>>>>>>> for this, people who are interested in this can
>> > > > subscribe
>> > > > > >>>> to the
>> > > > > >>>> >>>>>>>> JIRA?
>> > > > > >>>> >>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>> Regards!
>> > > > > >>>> >>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>> Aron Tao
>> > > > > >>>> >>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四
>> > > > > 上午3:18写道:
>> > > > > >>>> >>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> Hi all,
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite
>> optimizer
>> > > > into
>> > > > > a
>> > > > > >>>> >>> general
>> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our
>> research
>> > > > paper
>> > > > > >>>> >>>>>>> published
>> > > > > >>>> >>>>>>>> in
>> > > > > >>>> >>>>>>>>>>>> VLDB
>> > > > > >>>> >>>>>>>>>>>>> 2021:
>> > > > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer
>> framework
>> > for
>> > > > > >>>> >>> incremental
>> > > > > >>>> >>>>>>>> data
>> > > > > >>>> >>>>>>>>>>>>> processing
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating
>> > how
>> > > > > >>>> Alibaba’s
>> > > > > >>>> >>>>>>> data
>> > > > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental
>> > query
>> > > > > >>>> optimizer
>> > > > > >>>> >>> to
>> > > > > >>>> >>>>>>>>>>>> alleviate
>> > > > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
>> > > > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
>> > Resource-Aware
>> > > > > >>>> >>> Incremental
>> > > > > >>>> >>>>>>>>>>>> Computing
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first
>> general
>> > > > > >>>> cost-based
>> > > > > >>>> >>>>>>>>>> incremental
>> > > > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across
>> > multiple
>> > > > > >>>> families
>> > > > > >>>> >>> of
>> > > > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
>> > > > Streaming,
>> > > > > >>>> >>>>>>> DBToaster,
>> > > > > >>>> >>>>>>>>>> etc.
>> > > > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the
>> > generated
>> > > > best
>> > > > > >>>> plan
>> > > > > >>>> >>> is
>> > > > > >>>> >>>>>>>>>>>>> consistently much better than the plans from
>> each
>> > > > > >>>> individual
>> > > > > >>>> >>>>>>> method
>> > > > > >>>> >>>>>>>>>>>> alone.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> In general, incremental query planning is
>> central
>> > to
>> > > > > >>>> database
>> > > > > >>>> >>>>>>> view
>> > > > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and
>> are
>> > > > being
>> > > > > >>>> >>> adopted
>> > > > > >>>> >>>>>>> in
>> > > > > >>>> >>>>>>>>>>>> active
>> > > > > >>>> >>>>>>>>>>>>> databases, resumable query execution,
>> approximate
>> > > > query
>> > > > > >>>> >>>>>>> processing,
>> > > > > >>>> >>>>>>>>>> etc.
>> > > > > >>>> >>>>>>>>>>>> We
>> > > > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening
>> the
>> > > > > >>>> spectrum of
>> > > > > >>>> >>>>>>>>>> Calcite,
>> > > > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical
>> > details.
>> > > > > >>>> Please
>> > > > > >>>> >>>>>>> refer
>> > > > > >>>> >>>>>>>> to
>> > > > > >>>> >>>>>>>>>>>> the
>> > > > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also
>> > working
>> > > > on a
>> > > > > >>>> >>> journal
>> > > > > >>>> >>>>>>>>>> version
>> > > > > >>>> >>>>>>>>>>>> of
>> > > > > >>>> >>>>>>>>>>>>> the paper with more implementation details.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite
>> is
>> > meant
>> > > > > to
>> > > > > >>>> be
>> > > > > >>>> >>>>>>>> executed
>> > > > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s
>> > memo
>> > > > will
>> > > > > >>>> be
>> > > > > >>>> >>>>>>> extended
>> > > > > >>>> >>>>>>>>>> with
>> > > > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
>> > > > generating
>> > > > > >>>> >>>>>>> incremental
>> > > > > >>>> >>>>>>>>>>>> plans
>> > > > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
>> > > > different
>> > > > > >>>> time
>> > > > > >>>> >>>>>>> points.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that
>> > > > changes
>> > > > > >>>> over
>> > > > > >>>> >>> time
>> > > > > >>>> >>>>>>>>>> (Time
>> > > > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
>> > > > introduced
>> > > > > >>>> >>>>>>> TvrMetaSet
>> > > > > >>>> >>>>>>>>>> into
>> > > > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to
>> > track
>> > > > > >>>> related
>> > > > > >>>> >>>>>>> RelSets
>> > > > > >>>> >>>>>>>>>> of a
>> > > > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at
>> > certain
>> > > > > >>>> time,
>> > > > > >>>> >>>>>>> delta of
>> > > > > >>>> >>>>>>>>>> the
>> > > > > >>>> >>>>>>>>>>>>> table between two time points, etc.).
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> [image: image.png]
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical
>> > line
>> > > > is a
>> > > > > >>>> >>>>>>> TvrMetaSet
>> > > > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R,
>> > etc.).
>> > > > > >>>> >>> Horizontal
>> > > > > >>>> >>>>>>>> lines
>> > > > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a
>> > > > RelSet.
>> > > > > >>>> Users
>> > > > > >>>> >>> can
>> > > > > >>>> >>>>>>>>>> write
>> > > > > >>>> >>>>>>>>>>>> TVR
>> > > > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations
>> > > > between
>> > > > > >>>> these
>> > > > > >>>> >>>>>>> dots.
>> > > > > >>>> >>>>>>>>>> For
>> > > > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules
>> that
>> > > > > >>>> describe how
>> > > > > >>>> >>> to
>> > > > > >>>> >>>>>>>>>> compute
>> > > > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
>> > TVRs.
>> > > > The
>> > > > > >>>> red
>> > > > > >>>> >>> lines
>> > > > > >>>> >>>>>>>> are
>> > > > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations
>> > within a
>> > > > > >>>> TVR. All
>> > > > > >>>> >>>>>>> TVR
>> > > > > >>>> >>>>>>>>>>>> rewrite
>> > > > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite
>> > rules
>> > > > > still
>> > > > > >>>> work
>> > > > > >>>> >>> in
>> > > > > >>>> >>>>>>>> the
>> > > > > >>>> >>>>>>>>>> new
>> > > > > >>>> >>>>>>>>>>>>> volcano system without modification.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of
>> four
>> > > > parts:
>> > > > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>> > > > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
>> > TvrMetaSet
>> > > > > and
>> > > > > >>>> >>>>>>> RelNodes,
>> > > > > >>>> >>>>>>>>>> as
>> > > > > >>>> >>>>>>>>>>>>> well as links in between the nodes.
>> > > > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
>> > upgraded
>> > > > > >>>> rule
>> > > > > >>>> >>>>>>> engine
>> > > > > >>>> >>>>>>>>>> API.
>> > > > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the
>> best
>> > > > > >>>> incremental
>> > > > > >>>> >>>>>>> plan
>> > > > > >>>> >>>>>>>>>>>>> involving multiple time points.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in
>> nature
>> > and
>> > > > > thus
>> > > > > >>>> when
>> > > > > >>>> >>>>>>>>>> disabled,
>> > > > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also
>> applied
>> > > > this
>> > > > > >>>> >>>>>>>>>> Calcite-extended
>> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of
>> periodic
>> > > > query
>> > > > > >>>> called
>> > > > > >>>> >>>>>>> the
>> > > > > >>>> >>>>>>>>>>>> ‘‘range
>> > > > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It
>> achieved
>> > cost
>> > > > > >>>> savings
>> > > > > >>>> >>> of
>> > > > > >>>> >>>>>>> 80%
>> > > > > >>>> >>>>>>>>>> on
>> > > > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
>> > > > end-to-end
>> > > > > >>>> >>> execution
>> > > > > >>>> >>>>>>>>>> time.
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome.
>> Thanks
>> > and
>> > > > > happy
>> > > > > >>>> >>>>>>> holidays!
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>> Best,
>> > > > > >>>> >>>>>>>>>>>>> Botong
>> > > > > >>>> >>>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>>>
>> > > > > >>>> >>>>>>>>
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>>>>> --
>> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~
>> > > > > >>>> >>>>>>> no mistakes
>> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
>> > > > > >>>> >>>>>>>
>> > > > > >>>> >>>>>>
>> > > > > >>>> >>>
>> > > > > >>>> >>
>> > > > > >>>>
>> > > > > >>>>
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Viliam Durina
>> > > > Jet Developer
>> > > >       hazelcast®
>> > > >
>> > > >   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA
>> > 94402 |
>> > > > USA
>> > > > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
>> > > >
>> > > > --
>> > > > This message contains confidential information and is intended only
>> for
>> > > > the
>> > > > individuals named. If you are not the named addressee you should not
>> > > > disseminate, distribute or copy this e-mail. Please notify the
>> sender
>> > > > immediately by e-mail if you have received this e-mail by mistake
>> and
>> > > > delete this e-mail from your system. E-mail transmission cannot be
>> > > > guaranteed to be secure or error-free as information could be
>> > intercepted,
>> > > > corrupted, lost, destroyed, arrive late or incomplete, or contain
>> > viruses.
>> > > > The sender therefore does not accept liability for any errors or
>> > omissions
>> > > > in the contents of this message, which arise as a result of e-mail
>> > > > transmission. If verification is required, please request a
>> hard-copy
>> > > > version. -Hazelcast
>> > > >
>> >
>>
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi Stamatis and all,

Thanks for the interest! Let's tentatively schedule the next meeting next
Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's new
needs showing up.

Best,
Botong

On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hello,
>
> I really regret missing the first meeting, sorry about that. I added my
> preferences in the document.
> I will make sure to attend the next one and help as much as I can.
>
> I didn't have the chance yet to go over the paper but will try to do it
> before the next meeting.
>
> For me the following dates are more convenient than others so it would be
> nice if we could arrange it then.
>
> Thu, May 6, 10pm PST
> Tue, May 12, 10pm PST
>
> Best,
> Stamatis
>
> On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:
>
> > I have added my time preferences to the doc [1]. I am generally
> > available any evening Mon - Thu. How about we meet Monday 10th May?
> >
> > Stamatis, Jesus, Given the complexity of this work, I would very much
> > appreciate your insight, as experts in optimizer theory. Could one of
> > you join the next meeting? Of course we should choose a time that
> > works for everyone's schedule.
> >
> > Julian
> >
> > [1]
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >
> > On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com> wrote:
> > >
> > > We didn't record it, we will try to record the following meetings.
> Please
> > > add your time preference in the docs, so that we can find a meeting
> time
> > > that works for more people.
> > >
> > > Thanks,
> > > Botong
> > >
> > > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vi...@hazelcast.com>
> > wrote:
> > >
> > > > Is there a recording available?
> > > > Viliam
> > > >
> > > > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The meeting yesterday was fun and productive. As discussed, this is
> > the
> > > > > call to schedule our second meeting.
> > > > >
> > > > > We encourage everyone to add their time preferences during 05/01 -
> > 05/15
> > > > > here:
> > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > >
> > > > > Thanks,
> > > > > Botong
> > > > >
> > > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > > We've created a zoom meeting below for our meeting next Monday
> > > > > > (9pm-10:30pm PST on 04/26).
> > > > > > Talk to you all soon!
> > > > > >
> > > > > > Join Zoom Meeting
> > > > > > https://uci.zoom.us/j/91279732686
> > > > > > <
> > > > >
> > > >
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > > > > >
> > > > > >
> > > > > > Meeting ID: 912 7973 2686
> > > > > > One tap mobile
> > > > > > +16699006833,,91279732686# US (San Jose)
> > > > > > +12532158782,,91279732686# US (Tacoma)
> > > > > >
> > > > > > Dial by your location
> > > > > > +1 669 900 6833 US (San Jose)
> > > > > > +1 253 215 8782 US (Tacoma)
> > > > > > +1 346 248 7799 US (Houston)
> > > > > > +1 301 715 8592 US (Washington DC)
> > > > > > +1 312 626 6799 US (Chicago)
> > > > > > +1 646 558 8656 US (New York)
> > > > > > Meeting ID: 912 7973 2686
> > > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > > > > <
> > > > >
> > > >
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > > > > >
> > > > > >
> > > > > > Join by Skype for Business
> > > > > > https://uci.zoom.us/skype/91279732686
> > > > > > <
> > > > >
> > > >
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Botong
> > > > > >
> > > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pk...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> According to the preferences collected, we are tentatively
> > scheduling
> > > > > our
> > > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > > > > >>
> > > > > >> We will give a presentation about Tempura, followed by a free
> > > > > discussion.
> > > > > >>
> > > > > >> Please let us know if there are new other requests. Few days
> > before
> > > > > >> the meeting, I will send out a zoom meeting link.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Botong
> > > > > >>
> > > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com>
> > wrote:
> > > > > >>
> > > > > >>> Hi Julian and all,
> > > > > >>>
> > > > > >>> We've posted the Tempura code base below. Feel free to take a
> > quick
> > > > > peek
> > > > > >>> at the last five commits.
> > > > > >>>
> > > > >
> > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > > > > >>>
> > > > > >>> I've also opened a Jira (CALCITE-4568
> > > > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which
> > will
> > > > > serve
> > > > > >>> as the umbrella Jira for the feature.
> > > > > >>>
> > > > > >>> In the meantime, we encourage everyone to enter the time
> > preferences
> > > > > for
> > > > > >>> our first meeting here:
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Botong
> > > > > >>>
> > > > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> > jhyde.apache@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> I have added my time preferences to the doc.
> > > > > >>>>
> > > > > >>>> Before we meet, could you publish a PR for us to review?
> > > > > >>>>
> > > > > >>>> Initial discussions will need to be about architecture and
> > > > high-level
> > > > > >>>> design. So I would ask Calcite reviewers not to review the PR
> > > > > line-by-line
> > > > > >>>> (or to leave comments in GitHub) but try to understand the
> > design
> > > > > >>>> holistically, and prepare questions/comments before the
> meeting.
> > > > > >>>>
> > > > > >>>> Botong, Can you please create a Calcite JIRA case for this
> task?
> > > > JIRA
> > > > > >>>> how we track long-running tasks such as this.
> > > > > >>>>
> > > > > >>>> Julian
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com>
> > > > wrote:
> > > > > >>>> >
> > > > > >>>> > Hi all,
> > > > > >>>> >
> > > > > >>>> > Apology for the delay. It took us some time to clean up our
> > code
> > > > > base
> > > > > >>>> and
> > > > > >>>> > publicly release it (which will be out soon) for a quick
> peek.
> > > > > >>>> >
> > > > > >>>> > We are ready to present our work. Let's schedule a time for
> a
> > Zoom
> > > > > >>>> > meeting and discuss how to integrate Tempura into Calcite.
> > > > > >>>> >
> > > > > >>>> > Since some of our team members are in China, we prefer the
> > time
> > > > slot
> > > > > >>>> of
> > > > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference
> in
> > the
> > > > > >>>> shared
> > > > > >>>> > doc below.
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > > >>>> >
> > > > > >>>> > We encourage everyone to add their time preferences (during
> > > > > >>>> 04/15-04/30) in
> > > > > >>>> > this doc. In a week or so, we will try to settle a time that
> > works
> > > > > for
> > > > > >>>> > most.
> > > > > >>>> >
> > > > > >>>> > Thanks,
> > > > > >>>> > Botong
> > > > > >>>> >
> > > > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> > pkuhbt@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>> >
> > > > > >>>> >> Hi Julian and Rui,
> > > > > >>>> >>
> > > > > >>>> >> Sounds good to us. Please give us some time to prepare some
> > > > slides
> > > > > >>>> for the
> > > > > >>>> >> meeting.
> > > > > >>>> >>
> > > > > >>>> >> I've created a doc below for discussion. Please feel free
> to
> > add
> > > > > >>>> more in
> > > > > >>>> >> here:
> > > > > >>>> >>
> > > > > >>>> >>
> > > > > >>>>
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > > >>>> >>
> > > > > >>>> >> Thanks,
> > > > > >>>> >> Botong
> > > > > >>>> >>
> > > > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > > > > jhyde.apache@gmail.com
> > > > > >>>> >
> > > > > >>>> >> wrote:
> > > > > >>>> >>
> > > > > >>>> >>> PS The “editable doc” that Rui refers to is also a good
> > idea. I
> > > > > >>>> think we
> > > > > >>>> >>> should create it to continue discussion after the first
> > meeting.
> > > > > >>>> >>>
> > > > > >>>> >>> Julian
> > > > > >>>> >>>
> > > > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > > > > jhyde.apache@gmail.com>
> > > > > >>>> >>> wrote:
> > > > > >>>> >>>>
> > > > > >>>> >>>> I think good next steps would be a PR and a meeting. The
> > PR
> > > > will
> > > > > >>>> allow
> > > > > >>>> >>> us to read the code, but I think we should do the first
> > round of
> > > > > >>>> questions
> > > > > >>>> >>> at the meeting.  The meeting could perhaps start with a
> > > > > >>>> presentation of the
> > > > > >>>> >>> paper (do you have some slides you are planning to present
> > at
> > > > > VLDB,
> > > > > >>>> >>> Botong?) and then move on to questions about the concepts,
> > which
> > > > > >>>> >>> alternatives were considered, and how the concepts map
> onto
> > > > other
> > > > > >>>> current
> > > > > >>>> >>> and future concepts in calcite.
> > > > > >>>> >>>>
> > > > > >>>> >>>> I don’t think we should start “reviewing” the PR
> > line-by-line
> > > > at
> > > > > >>>> this
> > > > > >>>> >>> point. We need to understand the high-level concepts and
> > design
> > > > > >>>> choices. If
> > > > > >>>> >>> we start reviewing the PR we will get lost in the details.
> > > > > >>>> >>>>
> > > > > >>>> >>>> I know that integrating a major change is hard; I doubt
> > that we
> > > > > >>>> will be
> > > > > >>>> >>> able to integrate everything, but we can build
> understanding
> > > > about
> > > > > >>>> where
> > > > > >>>> >>> calcite needs to go, and I hope integrate a good amount of
> > code
> > > > to
> > > > > >>>> help us
> > > > > >>>> >>> get there.
> > > > > >>>> >>>>
> > > > > >>>> >>>> As I said before, after the integration I would like
> > people to
> > > > be
> > > > > >>>> able
> > > > > >>>> >>> to experiment with it and use it in their production
> > systems.
> > > > > That
> > > > > >>>> way, it
> > > > > >>>> >>> will not be an experiment that withers, but a feature set
> > > > > >>>> integrates with
> > > > > >>>> >>> other calcite features and gets stronger over time.
> > > > > >>>> >>>>
> > > > > >>>> >>>> Julian
> > > > > >>>> >>>>
> > > > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> > amaliujia@apache.org>
> > > > > >>>> wrote:
> > > > > >>>> >>>>>
> > > > > >>>> >>>>> For me to participate in the discussion for the above
> > > > > questions,
> > > > > >>>> I
> > > > > >>>> >>> will
> > > > > >>>> >>>>> need to read a lot more to know relevant context and
> > likely
> > > > ask
> > > > > >>>> lots of
> > > > > >>>> >>>>> questions :-).  A editable doc is probably good for
> > questions
> > > > > and
> > > > > >>>> back
> > > > > >>>> >>> and
> > > > > >>>> >>>>> forward discussion.
> > > > > >>>> >>>>>
> > > > > >>>> >>>>>
> > > > > >>>> >>>>> -Rui
> > > > > >>>> >>>>>
> > > > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > > > > amaliujia@apache.org
> > > > > >>>> >
> > > > > >>>> >>> wrote:
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>> I am also happy to help push this work into Calcite
> > (review
> > > > > code
> > > > > >>>> and
> > > > > >>>> >>> doc,
> > > > > >>>> >>>>>> etc.).
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>> While you can share your code so people can have more
> > idea
> > > > how
> > > > > >>>> it is
> > > > > >>>> >>>>>> implemented, I think it would be also nice to have a
> doc
> > to
> > > > > >>>> discuss
> > > > > >>>> >>> open
> > > > > >>>> >>>>>> questions above. Some points that I copy those to here:
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>> 1. Can this solution be compatible with existing
> > solutions in
> > > > > >>>> Calcite
> > > > > >>>> >>>>>> Streaming, materialized view maintenance, and
> multi-query
> > > > > >>>> optimization
> > > > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and
> Spool
> > > > > >>>> operator),
> > > > > >>>> >>>>>> 2. Did you find that you needed two separate cost
> models
> > -
> > > > one
> > > > > >>>> for
> > > > > >>>> >>> “view
> > > > > >>>> >>>>>> maintenance” and another for “user queries” - since the
> > > > > >>>> objectives of
> > > > > >>>> >>> each
> > > > > >>>> >>>>>> activity are so different?
> > > > > >>>> >>>>>> 3. whether this work will hasten the arrival of
> > > > multi-objective
> > > > > >>>> >>> parametric
> > > > > >>>> >>>>>> query optimization [1] in Calcite.
> > > > > >>>> >>>>>> 4. probably SQL shell support.
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>> [1]:
> > > > > >>>> >>>>>>
> > > > > >>>> >>>
> > > > > >>>>
> > > > >
> > > >
> >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>> -Rui
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>>
> > > > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> > zinking3@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>>>>> it would be very nice to see a POC of your work.
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > > > > >>>> pkuhbt@gmail.com>
> > > > > >>>> >>> wrote:
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>>>>>> Hi Julian,
> > > > > >>>> >>>>>>>>
> > > > > >>>> >>>>>>>> Just wondering if there are any updates? We are
> > wondering
> > > > if
> > > > > it
> > > > > >>>> >>> would
> > > > > >>>> >>>>>>> help
> > > > > >>>> >>>>>>>> to post our code for a quick preview.
> > > > > >>>> >>>>>>>>
> > > > > >>>> >>>>>>>> Thanks,
> > > > > >>>> >>>>>>>> Botong
> > > > > >>>> >>>>>>>>
> > > > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > > > > pkuhbt@gmail.com
> > > > > >>>> >
> > > > > >>>> >>> wrote:
> > > > > >>>> >>>>>>>>
> > > > > >>>> >>>>>>>>> Hi Julian,
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a
> plan
> > > > that
> > > > > >>>> best
> > > > > >>>> >>>>>>> benefits
> > > > > >>>> >>>>>>>>> the community. Here are some clarifications that
> > hopefully
> > > > > >>>> answer
> > > > > >>>> >>> your
> > > > > >>>> >>>>>>>>> questions.
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time
> > > > points
> > > > > to
> > > > > >>>> >>>>>>> consider
> > > > > >>>> >>>>>>>>> running and a cost function that expresses users'
> > > > preference
> > > > > >>>> over
> > > > > >>>> >>>>>>> time,
> > > > > >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> > > > > >>>> minimizes the
> > > > > >>>> >>>>>>>> overall
> > > > > >>>> >>>>>>>>> cost function.
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different
> > time
> > > > > >>>> points
> > > > > >>>> >>> can
> > > > > >>>> >>>>>>> be
> > > > > >>>> >>>>>>>>> different from each other, as opposed to identical
> > plans
> > > > in
> > > > > >>>> all
> > > > > >>>> >>> delta
> > > > > >>>> >>>>>>>> runs
> > > > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the
> > > > Tempura
> > > > > >>>> paper,
> > > > > >>>> >>> we
> > > > > >>>> >>>>>>> can
> > > > > >>>> >>>>>>>>> mimic the current streaming implementation by
> > specifying
> > > > two
> > > > > >>>> >>> (logical)
> > > > > >>>> >>>>>>>> time
> > > > > >>>> >>>>>>>>> points in Tempura, representing the initial run and
> > later
> > > > > >>>> delta
> > > > > >>>> >>> runs
> > > > > >>>> >>>>>>>>> respectively. In general, note that Tempura supports
> > > > various
> > > > > >>>> form
> > > > > >>>> >>> of
> > > > > >>>> >>>>>>>>> incremental computing, not only the small-delta
> > > > append-only
> > > > > >>>> data
> > > > > >>>> >>>>>>> model in
> > > > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura
> > subsumes
> > > > > the
> > > > > >>>> >>> current
> > > > > >>>> >>>>>>>>> streaming support, as well as any IVM
> implementations.
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> About the cost model, we did not come up with a
> > seperate
> > > > > cost
> > > > > >>>> >>> model,
> > > > > >>>> >>>>>>> but
> > > > > >>>> >>>>>>>>> rather extended the existing one. Similar to
> > > > multi-objective
> > > > > >>>> >>>>>>>> optimization,
> > > > > >>>> >>>>>>>>> costs incurred at different time points are
> considered
> > > > > >>>> different
> > > > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function
> that
> > > > > >>>> converts this
> > > > > >>>> >>>>>>> cost
> > > > > >>>> >>>>>>>>> vector into a final cost. So under this function,
> any
> > two
> > > > > >>>> >>> incremental
> > > > > >>>> >>>>>>>> plans
> > > > > >>>> >>>>>>>>> are still comparable and there is an overall
> optimum.
> > I
> > > > > guess
> > > > > >>>> we
> > > > > >>>> >>> can
> > > > > >>>> >>>>>>> go
> > > > > >>>> >>>>>>>>> down the route of multi-objective parametric query
> > > > > >>>> optimization
> > > > > >>>> >>>>>>> instead
> > > > > >>>> >>>>>>>> if
> > > > > >>>> >>>>>>>>> there is a need.
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> Next on materialized views and multi-query
> > optimization,
> > > > > >>>> since our
> > > > > >>>> >>>>>>>>> multi-time-point plan naturally involves
> materializing
> > > > > >>>> intermediate
> > > > > >>>> >>>>>>>> results
> > > > > >>>> >>>>>>>>> for later time points, we need to solve the problem
> of
> > > > > >>>> choosing
> > > > > >>>> >>>>>>>>> materializations and include the cost of saving and
> > > > reusing
> > > > > >>>> the
> > > > > >>>> >>>>>>>>> materializations when costing and comparing plans.
> We
> > > > > >>>> borrowed the
> > > > > >>>> >>>>>>>>> multi-query optimization techniques to solve this
> > problem
> > > > > even
> > > > > >>>> >>> though
> > > > > >>>> >>>>>>> we
> > > > > >>>> >>>>>>>>> are looking at a single query. As a result, we think
> > our
> > > > > work
> > > > > >>>> is
> > > > > >>>> >>>>>>>> orthogonal
> > > > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing
> > views,
> > > > > >>>> lattice
> > > > > >>>> >>> etc.
> > > > > >>>> >>>>>>> We
> > > > > >>>> >>>>>>>> do
> > > > > >>>> >>>>>>>>> feel that the multi-query optimization component can
> > be
> > > > > >>>> adopted to
> > > > > >>>> >>>>>>> wider
> > > > > >>>> >>>>>>>>> use, but probably need more suggestions from the
> > > > community.
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> Lastly, our current implementation is set up in java
> > code,
> > > > > it
> > > > > >>>> >>> should
> > > > > >>>> >>>>>>> be
> > > > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> Thanks,
> > > > > >>>> >>>>>>>>> Botong
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > > > > >>>> >>> jhyde.apache@gmail.com>
> > > > > >>>> >>>>>>>>> wrote:
> > > > > >>>> >>>>>>>>>
> > > > > >>>> >>>>>>>>>> Botong,
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> This is very exciting; congratulations on this
> > research,
> > > > > and
> > > > > >>>> thank
> > > > > >>>> >>>>>>> you
> > > > > >>>> >>>>>>>>>> for contributing it back to Calcite.
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> The research touches several areas in Calcite:
> > streaming,
> > > > > >>>> >>>>>>> materialized
> > > > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As
> we
> > > > have
> > > > > >>>> already
> > > > > >>>> >>>>>>> some
> > > > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta
> relational
> > > > > >>>> operators,
> > > > > >>>> >>>>>>> lattice,
> > > > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see
> > > > whether
> > > > > >>>> we can
> > > > > >>>> >>>>>>> make
> > > > > >>>> >>>>>>>> them
> > > > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume
> > others.
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> Your work differs from streaming queries in that
> your
> > > > > >>>> relations
> > > > > >>>> >>> are
> > > > > >>>> >>>>>>> used
> > > > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure
> streaming
> > > > > >>>> queries, the
> > > > > >>>> >>>>>>> only
> > > > > >>>> >>>>>>>>>> activity is the change propagation. Did you find
> > that you
> > > > > >>>> needed
> > > > > >>>> >>> two
> > > > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance”
> and
> > > > > >>>> another for
> > > > > >>>> >>>>>>> “user
> > > > > >>>> >>>>>>>>>> queries” - since the objectives of each activity
> are
> > so
> > > > > >>>> different?
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival
> of
> > > > > >>>> >>> multi-objective
> > > > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> I will make time over the next few days to read and
> > > > digest
> > > > > >>>> your
> > > > > >>>> >>>>>>> paper.
> > > > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth
> > process
> > > > to
> > > > > >>>> create
> > > > > >>>> >>>>>>>>>> something that will be useful for the broader
> > community.
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> One thing will be particularly useful: making this
> > > > > >>>> functionality
> > > > > >>>> >>>>>>>>>> available from a SQL shell, so that people can
> > experiment
> > > > > >>>> with
> > > > > >>>> >>> this
> > > > > >>>> >>>>>>>>>> functionality without writing Java code or setting
> up
> > > > > complex
> > > > > >>>> >>>>>>> databases
> > > > > >>>> >>>>>>>> and
> > > > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple
> > DDL
> > > > > >>>> operations
> > > > > >>>> >>>>>>> that
> > > > > >>>> >>>>>>>> are
> > > > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> > whether
> > > > we
> > > > > >>>> could
> > > > > >>>> >>>>>>> devise
> > > > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> Julian
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>> [1]
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>
> > > > > >>>>
> > > > >
> > > >
> >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > > > > pkuhbt@gmail.com
> > > > > >>>> >
> > > > > >>>> >>>>>>> wrote:
> > > > > >>>> >>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the
> > figure,
> > > > > please
> > > > > >>>> >>> refer
> > > > > >>>> >>>>>>> to
> > > > > >>>> >>>>>>>>>> Fig
> > > > > >>>> >>>>>>>>>>> 3(a) in our paper:
> > > > > >>>> >>>>>>>>>>
> > https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > > > > >>>> >>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>> Best,
> > > > > >>>> >>>>>>>>>>> Botong
> > > > > >>>> >>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > > > > >>>> taojiatao@gmail.com>
> > > > > >>>> >>>>>>>> wrote:
> > > > > >>>> >>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the
> > mail,
> > > > > >>>> may you
> > > > > >>>> >>>>>>> open
> > > > > >>>> >>>>>>>> a
> > > > > >>>> >>>>>>>>>> JIRA
> > > > > >>>> >>>>>>>>>>>> for this, people who are interested in this can
> > > > subscribe
> > > > > >>>> to the
> > > > > >>>> >>>>>>>> JIRA?
> > > > > >>>> >>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>> Regards!
> > > > > >>>> >>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>> Aron Tao
> > > > > >>>> >>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四
> > > > > 上午3:18写道:
> > > > > >>>> >>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> Hi all,
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite
> optimizer
> > > > into
> > > > > a
> > > > > >>>> >>> general
> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our
> research
> > > > paper
> > > > > >>>> >>>>>>> published
> > > > > >>>> >>>>>>>> in
> > > > > >>>> >>>>>>>>>>>> VLDB
> > > > > >>>> >>>>>>>>>>>>> 2021:
> > > > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer
> framework
> > for
> > > > > >>>> >>> incremental
> > > > > >>>> >>>>>>>> data
> > > > > >>>> >>>>>>>>>>>>> processing
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating
> > how
> > > > > >>>> Alibaba’s
> > > > > >>>> >>>>>>> data
> > > > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental
> > query
> > > > > >>>> optimizer
> > > > > >>>> >>> to
> > > > > >>>> >>>>>>>>>>>> alleviate
> > > > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> > > > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> > Resource-Aware
> > > > > >>>> >>> Incremental
> > > > > >>>> >>>>>>>>>>>> Computing
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> > > > > >>>> cost-based
> > > > > >>>> >>>>>>>>>> incremental
> > > > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across
> > multiple
> > > > > >>>> families
> > > > > >>>> >>> of
> > > > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
> > > > Streaming,
> > > > > >>>> >>>>>>> DBToaster,
> > > > > >>>> >>>>>>>>>> etc.
> > > > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the
> > generated
> > > > best
> > > > > >>>> plan
> > > > > >>>> >>> is
> > > > > >>>> >>>>>>>>>>>>> consistently much better than the plans from
> each
> > > > > >>>> individual
> > > > > >>>> >>>>>>> method
> > > > > >>>> >>>>>>>>>>>> alone.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> In general, incremental query planning is
> central
> > to
> > > > > >>>> database
> > > > > >>>> >>>>>>> view
> > > > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and
> are
> > > > being
> > > > > >>>> >>> adopted
> > > > > >>>> >>>>>>> in
> > > > > >>>> >>>>>>>>>>>> active
> > > > > >>>> >>>>>>>>>>>>> databases, resumable query execution,
> approximate
> > > > query
> > > > > >>>> >>>>>>> processing,
> > > > > >>>> >>>>>>>>>> etc.
> > > > > >>>> >>>>>>>>>>>> We
> > > > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening
> the
> > > > > >>>> spectrum of
> > > > > >>>> >>>>>>>>>> Calcite,
> > > > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical
> > details.
> > > > > >>>> Please
> > > > > >>>> >>>>>>> refer
> > > > > >>>> >>>>>>>> to
> > > > > >>>> >>>>>>>>>>>> the
> > > > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also
> > working
> > > > on a
> > > > > >>>> >>> journal
> > > > > >>>> >>>>>>>>>> version
> > > > > >>>> >>>>>>>>>>>> of
> > > > > >>>> >>>>>>>>>>>>> the paper with more implementation details.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is
> > meant
> > > > > to
> > > > > >>>> be
> > > > > >>>> >>>>>>>> executed
> > > > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s
> > memo
> > > > will
> > > > > >>>> be
> > > > > >>>> >>>>>>> extended
> > > > > >>>> >>>>>>>>>> with
> > > > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
> > > > generating
> > > > > >>>> >>>>>>> incremental
> > > > > >>>> >>>>>>>>>>>> plans
> > > > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
> > > > different
> > > > > >>>> time
> > > > > >>>> >>>>>>> points.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that
> > > > changes
> > > > > >>>> over
> > > > > >>>> >>> time
> > > > > >>>> >>>>>>>>>> (Time
> > > > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> > > > introduced
> > > > > >>>> >>>>>>> TvrMetaSet
> > > > > >>>> >>>>>>>>>> into
> > > > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to
> > track
> > > > > >>>> related
> > > > > >>>> >>>>>>> RelSets
> > > > > >>>> >>>>>>>>>> of a
> > > > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> > certain
> > > > > >>>> time,
> > > > > >>>> >>>>>>> delta of
> > > > > >>>> >>>>>>>>>> the
> > > > > >>>> >>>>>>>>>>>>> table between two time points, etc.).
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> [image: image.png]
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical
> > line
> > > > is a
> > > > > >>>> >>>>>>> TvrMetaSet
> > > > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R,
> > etc.).
> > > > > >>>> >>> Horizontal
> > > > > >>>> >>>>>>>> lines
> > > > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a
> > > > RelSet.
> > > > > >>>> Users
> > > > > >>>> >>> can
> > > > > >>>> >>>>>>>>>> write
> > > > > >>>> >>>>>>>>>>>> TVR
> > > > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations
> > > > between
> > > > > >>>> these
> > > > > >>>> >>>>>>> dots.
> > > > > >>>> >>>>>>>>>> For
> > > > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules
> that
> > > > > >>>> describe how
> > > > > >>>> >>> to
> > > > > >>>> >>>>>>>>>> compute
> > > > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
> > TVRs.
> > > > The
> > > > > >>>> red
> > > > > >>>> >>> lines
> > > > > >>>> >>>>>>>> are
> > > > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations
> > within a
> > > > > >>>> TVR. All
> > > > > >>>> >>>>>>> TVR
> > > > > >>>> >>>>>>>>>>>> rewrite
> > > > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite
> > rules
> > > > > still
> > > > > >>>> work
> > > > > >>>> >>> in
> > > > > >>>> >>>>>>>> the
> > > > > >>>> >>>>>>>>>> new
> > > > > >>>> >>>>>>>>>>>>> volcano system without modification.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four
> > > > parts:
> > > > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > > > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
> > TvrMetaSet
> > > > > and
> > > > > >>>> >>>>>>> RelNodes,
> > > > > >>>> >>>>>>>>>> as
> > > > > >>>> >>>>>>>>>>>>> well as links in between the nodes.
> > > > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
> > upgraded
> > > > > >>>> rule
> > > > > >>>> >>>>>>> engine
> > > > > >>>> >>>>>>>>>> API.
> > > > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the
> best
> > > > > >>>> incremental
> > > > > >>>> >>>>>>> plan
> > > > > >>>> >>>>>>>>>>>>> involving multiple time points.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature
> > and
> > > > > thus
> > > > > >>>> when
> > > > > >>>> >>>>>>>>>> disabled,
> > > > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also
> applied
> > > > this
> > > > > >>>> >>>>>>>>>> Calcite-extended
> > > > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of
> periodic
> > > > query
> > > > > >>>> called
> > > > > >>>> >>>>>>> the
> > > > > >>>> >>>>>>>>>>>> ‘‘range
> > > > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved
> > cost
> > > > > >>>> savings
> > > > > >>>> >>> of
> > > > > >>>> >>>>>>> 80%
> > > > > >>>> >>>>>>>>>> on
> > > > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> > > > end-to-end
> > > > > >>>> >>> execution
> > > > > >>>> >>>>>>>>>> time.
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks
> > and
> > > > > happy
> > > > > >>>> >>>>>>> holidays!
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>> Best,
> > > > > >>>> >>>>>>>>>>>>> Botong
> > > > > >>>> >>>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>>>
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>>>
> > > > > >>>> >>>>>>>>
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>>>>> --
> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~
> > > > > >>>> >>>>>>> no mistakes
> > > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> > > > > >>>> >>>>>>>
> > > > > >>>> >>>>>>
> > > > > >>>> >>>
> > > > > >>>> >>
> > > > > >>>>
> > > > > >>>>
> > > > >
> > > >
> > > >
> > > > --
> > > > Viliam Durina
> > > > Jet Developer
> > > >       hazelcast®
> > > >
> > > >   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA
> > 94402 |
> > > > USA
> > > > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
> > > >
> > > > --
> > > > This message contains confidential information and is intended only
> for
> > > > the
> > > > individuals named. If you are not the named addressee you should not
> > > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > > immediately by e-mail if you have received this e-mail by mistake and
> > > > delete this e-mail from your system. E-mail transmission cannot be
> > > > guaranteed to be secure or error-free as information could be
> > intercepted,
> > > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > viruses.
> > > > The sender therefore does not accept liability for any errors or
> > omissions
> > > > in the contents of this message, which arise as a result of e-mail
> > > > transmission. If verification is required, please request a hard-copy
> > > > version. -Hazelcast
> > > >
> >
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hello,

I really regret missing the first meeting, sorry about that. I added my
preferences in the document.
I will make sure to attend the next one and help as much as I can.

I didn't have the chance yet to go over the paper but will try to do it
before the next meeting.

For me the following dates are more convenient than others so it would be
nice if we could arrange it then.

Thu, May 6, 10pm PST
Tue, May 12, 10pm PST

Best,
Stamatis

On Sat, May 1, 2021 at 9:42 PM Julian Hyde <jh...@apache.org> wrote:

> I have added my time preferences to the doc [1]. I am generally
> available any evening Mon - Thu. How about we meet Monday 10th May?
>
> Stamatis, Jesus, Given the complexity of this work, I would very much
> appreciate your insight, as experts in optimizer theory. Could one of
> you join the next meeting? Of course we should choose a time that
> works for everyone's schedule.
>
> Julian
>
> [1]
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>
> On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com> wrote:
> >
> > We didn't record it, we will try to record the following meetings. Please
> > add your time preference in the docs, so that we can find a meeting time
> > that works for more people.
> >
> > Thanks,
> > Botong
> >
> > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vi...@hazelcast.com>
> wrote:
> >
> > > Is there a recording available?
> > > Viliam
> > >
> > > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > The meeting yesterday was fun and productive. As discussed, this is
> the
> > > > call to schedule our second meeting.
> > > >
> > > > We encourage everyone to add their time preferences during 05/01 -
> 05/15
> > > > here:
> > > >
> > > >
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >
> > > > Thanks,
> > > > Botong
> > > >
> > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com>
> wrote:
> > > >
> > > > > Hi all,
> > > > > We've created a zoom meeting below for our meeting next Monday
> > > > > (9pm-10:30pm PST on 04/26).
> > > > > Talk to you all soon!
> > > > >
> > > > > Join Zoom Meeting
> > > > > https://uci.zoom.us/j/91279732686
> > > > > <
> > > >
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > > > >
> > > > >
> > > > > Meeting ID: 912 7973 2686
> > > > > One tap mobile
> > > > > +16699006833,,91279732686# US (San Jose)
> > > > > +12532158782,,91279732686# US (Tacoma)
> > > > >
> > > > > Dial by your location
> > > > > +1 669 900 6833 US (San Jose)
> > > > > +1 253 215 8782 US (Tacoma)
> > > > > +1 346 248 7799 US (Houston)
> > > > > +1 301 715 8592 US (Washington DC)
> > > > > +1 312 626 6799 US (Chicago)
> > > > > +1 646 558 8656 US (New York)
> > > > > Meeting ID: 912 7973 2686
> > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > > > <
> > > >
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > > > >
> > > > >
> > > > > Join by Skype for Business
> > > > > https://uci.zoom.us/skype/91279732686
> > > > > <
> > > >
> > >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Botong
> > > > >
> > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pk...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> According to the preferences collected, we are tentatively
> scheduling
> > > > our
> > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > > > >>
> > > > >> We will give a presentation about Tempura, followed by a free
> > > > discussion.
> > > > >>
> > > > >> Please let us know if there are new other requests. Few days
> before
> > > > >> the meeting, I will send out a zoom meeting link.
> > > > >>
> > > > >> Thanks,
> > > > >> Botong
> > > > >>
> > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com>
> wrote:
> > > > >>
> > > > >>> Hi Julian and all,
> > > > >>>
> > > > >>> We've posted the Tempura code base below. Feel free to take a
> quick
> > > > peek
> > > > >>> at the last five commits.
> > > > >>>
> > > >
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > > > >>>
> > > > >>> I've also opened a Jira (CALCITE-4568
> > > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which
> will
> > > > serve
> > > > >>> as the umbrella Jira for the feature.
> > > > >>>
> > > > >>> In the meantime, we encourage everyone to enter the time
> preferences
> > > > for
> > > > >>> our first meeting here:
> > > > >>>
> > > > >>>
> > > >
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Botong
> > > > >>>
> > > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <
> jhyde.apache@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> I have added my time preferences to the doc.
> > > > >>>>
> > > > >>>> Before we meet, could you publish a PR for us to review?
> > > > >>>>
> > > > >>>> Initial discussions will need to be about architecture and
> > > high-level
> > > > >>>> design. So I would ask Calcite reviewers not to review the PR
> > > > line-by-line
> > > > >>>> (or to leave comments in GitHub) but try to understand the
> design
> > > > >>>> holistically, and prepare questions/comments before the meeting.
> > > > >>>>
> > > > >>>> Botong, Can you please create a Calcite JIRA case for this task?
> > > JIRA
> > > > >>>> how we track long-running tasks such as this.
> > > > >>>>
> > > > >>>> Julian
> > > > >>>>
> > > > >>>>
> > > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com>
> > > wrote:
> > > > >>>> >
> > > > >>>> > Hi all,
> > > > >>>> >
> > > > >>>> > Apology for the delay. It took us some time to clean up our
> code
> > > > base
> > > > >>>> and
> > > > >>>> > publicly release it (which will be out soon) for a quick peek.
> > > > >>>> >
> > > > >>>> > We are ready to present our work. Let's schedule a time for a
> Zoom
> > > > >>>> > meeting and discuss how to integrate Tempura into Calcite.
> > > > >>>> >
> > > > >>>> > Since some of our team members are in China, we prefer the
> time
> > > slot
> > > > >>>> of
> > > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in
> the
> > > > >>>> shared
> > > > >>>> > doc below.
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > >>>> >
> > > > >>>> > We encourage everyone to add their time preferences (during
> > > > >>>> 04/15-04/30) in
> > > > >>>> > this doc. In a week or so, we will try to settle a time that
> works
> > > > for
> > > > >>>> > most.
> > > > >>>> >
> > > > >>>> > Thanks,
> > > > >>>> > Botong
> > > > >>>> >
> > > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <
> pkuhbt@gmail.com>
> > > > >>>> wrote:
> > > > >>>> >
> > > > >>>> >> Hi Julian and Rui,
> > > > >>>> >>
> > > > >>>> >> Sounds good to us. Please give us some time to prepare some
> > > slides
> > > > >>>> for the
> > > > >>>> >> meeting.
> > > > >>>> >>
> > > > >>>> >> I've created a doc below for discussion. Please feel free to
> add
> > > > >>>> more in
> > > > >>>> >> here:
> > > > >>>> >>
> > > > >>>> >>
> > > > >>>>
> > > >
> > >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > >>>> >>
> > > > >>>> >> Thanks,
> > > > >>>> >> Botong
> > > > >>>> >>
> > > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > > > jhyde.apache@gmail.com
> > > > >>>> >
> > > > >>>> >> wrote:
> > > > >>>> >>
> > > > >>>> >>> PS The “editable doc” that Rui refers to is also a good
> idea. I
> > > > >>>> think we
> > > > >>>> >>> should create it to continue discussion after the first
> meeting.
> > > > >>>> >>>
> > > > >>>> >>> Julian
> > > > >>>> >>>
> > > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > > > jhyde.apache@gmail.com>
> > > > >>>> >>> wrote:
> > > > >>>> >>>>
> > > > >>>> >>>> I think good next steps would be a PR and a meeting. The
> PR
> > > will
> > > > >>>> allow
> > > > >>>> >>> us to read the code, but I think we should do the first
> round of
> > > > >>>> questions
> > > > >>>> >>> at the meeting.  The meeting could perhaps start with a
> > > > >>>> presentation of the
> > > > >>>> >>> paper (do you have some slides you are planning to present
> at
> > > > VLDB,
> > > > >>>> >>> Botong?) and then move on to questions about the concepts,
> which
> > > > >>>> >>> alternatives were considered, and how the concepts map onto
> > > other
> > > > >>>> current
> > > > >>>> >>> and future concepts in calcite.
> > > > >>>> >>>>
> > > > >>>> >>>> I don’t think we should start “reviewing” the PR
> line-by-line
> > > at
> > > > >>>> this
> > > > >>>> >>> point. We need to understand the high-level concepts and
> design
> > > > >>>> choices. If
> > > > >>>> >>> we start reviewing the PR we will get lost in the details.
> > > > >>>> >>>>
> > > > >>>> >>>> I know that integrating a major change is hard; I doubt
> that we
> > > > >>>> will be
> > > > >>>> >>> able to integrate everything, but we can build understanding
> > > about
> > > > >>>> where
> > > > >>>> >>> calcite needs to go, and I hope integrate a good amount of
> code
> > > to
> > > > >>>> help us
> > > > >>>> >>> get there.
> > > > >>>> >>>>
> > > > >>>> >>>> As I said before, after the integration I would like
> people to
> > > be
> > > > >>>> able
> > > > >>>> >>> to experiment with it and use it in their production
> systems.
> > > > That
> > > > >>>> way, it
> > > > >>>> >>> will not be an experiment that withers, but a feature set
> > > > >>>> integrates with
> > > > >>>> >>> other calcite features and gets stronger over time.
> > > > >>>> >>>>
> > > > >>>> >>>> Julian
> > > > >>>> >>>>
> > > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <
> amaliujia@apache.org>
> > > > >>>> wrote:
> > > > >>>> >>>>>
> > > > >>>> >>>>> For me to participate in the discussion for the above
> > > > questions,
> > > > >>>> I
> > > > >>>> >>> will
> > > > >>>> >>>>> need to read a lot more to know relevant context and
> likely
> > > ask
> > > > >>>> lots of
> > > > >>>> >>>>> questions :-).  A editable doc is probably good for
> questions
> > > > and
> > > > >>>> back
> > > > >>>> >>> and
> > > > >>>> >>>>> forward discussion.
> > > > >>>> >>>>>
> > > > >>>> >>>>>
> > > > >>>> >>>>> -Rui
> > > > >>>> >>>>>
> > > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > > > amaliujia@apache.org
> > > > >>>> >
> > > > >>>> >>> wrote:
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> I am also happy to help push this work into Calcite
> (review
> > > > code
> > > > >>>> and
> > > > >>>> >>> doc,
> > > > >>>> >>>>>> etc.).
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> While you can share your code so people can have more
> idea
> > > how
> > > > >>>> it is
> > > > >>>> >>>>>> implemented, I think it would be also nice to have a doc
> to
> > > > >>>> discuss
> > > > >>>> >>> open
> > > > >>>> >>>>>> questions above. Some points that I copy those to here:
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> 1. Can this solution be compatible with existing
> solutions in
> > > > >>>> Calcite
> > > > >>>> >>>>>> Streaming, materialized view maintenance, and multi-query
> > > > >>>> optimization
> > > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
> > > > >>>> operator),
> > > > >>>> >>>>>> 2. Did you find that you needed two separate cost models
> -
> > > one
> > > > >>>> for
> > > > >>>> >>> “view
> > > > >>>> >>>>>> maintenance” and another for “user queries” - since the
> > > > >>>> objectives of
> > > > >>>> >>> each
> > > > >>>> >>>>>> activity are so different?
> > > > >>>> >>>>>> 3. whether this work will hasten the arrival of
> > > multi-objective
> > > > >>>> >>> parametric
> > > > >>>> >>>>>> query optimization [1] in Calcite.
> > > > >>>> >>>>>> 4. probably SQL shell support.
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> [1]:
> > > > >>>> >>>>>>
> > > > >>>> >>>
> > > > >>>>
> > > >
> > >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> -Rui
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <
> zinking3@gmail.com>
> > > > >>>> wrote:
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>> it would be very nice to see a POC of your work.
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > > > >>>> pkuhbt@gmail.com>
> > > > >>>> >>> wrote:
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>>> Hi Julian,
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>> Just wondering if there are any updates? We are
> wondering
> > > if
> > > > it
> > > > >>>> >>> would
> > > > >>>> >>>>>>> help
> > > > >>>> >>>>>>>> to post our code for a quick preview.
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>> Thanks,
> > > > >>>> >>>>>>>> Botong
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > > > pkuhbt@gmail.com
> > > > >>>> >
> > > > >>>> >>> wrote:
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>>> Hi Julian,
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan
> > > that
> > > > >>>> best
> > > > >>>> >>>>>>> benefits
> > > > >>>> >>>>>>>>> the community. Here are some clarifications that
> hopefully
> > > > >>>> answer
> > > > >>>> >>> your
> > > > >>>> >>>>>>>>> questions.
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time
> > > points
> > > > to
> > > > >>>> >>>>>>> consider
> > > > >>>> >>>>>>>>> running and a cost function that expresses users'
> > > preference
> > > > >>>> over
> > > > >>>> >>>>>>> time,
> > > > >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> > > > >>>> minimizes the
> > > > >>>> >>>>>>>> overall
> > > > >>>> >>>>>>>>> cost function.
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different
> time
> > > > >>>> points
> > > > >>>> >>> can
> > > > >>>> >>>>>>> be
> > > > >>>> >>>>>>>>> different from each other, as opposed to identical
> plans
> > > in
> > > > >>>> all
> > > > >>>> >>> delta
> > > > >>>> >>>>>>>> runs
> > > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the
> > > Tempura
> > > > >>>> paper,
> > > > >>>> >>> we
> > > > >>>> >>>>>>> can
> > > > >>>> >>>>>>>>> mimic the current streaming implementation by
> specifying
> > > two
> > > > >>>> >>> (logical)
> > > > >>>> >>>>>>>> time
> > > > >>>> >>>>>>>>> points in Tempura, representing the initial run and
> later
> > > > >>>> delta
> > > > >>>> >>> runs
> > > > >>>> >>>>>>>>> respectively. In general, note that Tempura supports
> > > various
> > > > >>>> form
> > > > >>>> >>> of
> > > > >>>> >>>>>>>>> incremental computing, not only the small-delta
> > > append-only
> > > > >>>> data
> > > > >>>> >>>>>>> model in
> > > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura
> subsumes
> > > > the
> > > > >>>> >>> current
> > > > >>>> >>>>>>>>> streaming support, as well as any IVM implementations.
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> About the cost model, we did not come up with a
> seperate
> > > > cost
> > > > >>>> >>> model,
> > > > >>>> >>>>>>> but
> > > > >>>> >>>>>>>>> rather extended the existing one. Similar to
> > > multi-objective
> > > > >>>> >>>>>>>> optimization,
> > > > >>>> >>>>>>>>> costs incurred at different time points are considered
> > > > >>>> different
> > > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
> > > > >>>> converts this
> > > > >>>> >>>>>>> cost
> > > > >>>> >>>>>>>>> vector into a final cost. So under this function, any
> two
> > > > >>>> >>> incremental
> > > > >>>> >>>>>>>> plans
> > > > >>>> >>>>>>>>> are still comparable and there is an overall optimum.
> I
> > > > guess
> > > > >>>> we
> > > > >>>> >>> can
> > > > >>>> >>>>>>> go
> > > > >>>> >>>>>>>>> down the route of multi-objective parametric query
> > > > >>>> optimization
> > > > >>>> >>>>>>> instead
> > > > >>>> >>>>>>>> if
> > > > >>>> >>>>>>>>> there is a need.
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> Next on materialized views and multi-query
> optimization,
> > > > >>>> since our
> > > > >>>> >>>>>>>>> multi-time-point plan naturally involves materializing
> > > > >>>> intermediate
> > > > >>>> >>>>>>>> results
> > > > >>>> >>>>>>>>> for later time points, we need to solve the problem of
> > > > >>>> choosing
> > > > >>>> >>>>>>>>> materializations and include the cost of saving and
> > > reusing
> > > > >>>> the
> > > > >>>> >>>>>>>>> materializations when costing and comparing plans. We
> > > > >>>> borrowed the
> > > > >>>> >>>>>>>>> multi-query optimization techniques to solve this
> problem
> > > > even
> > > > >>>> >>> though
> > > > >>>> >>>>>>> we
> > > > >>>> >>>>>>>>> are looking at a single query. As a result, we think
> our
> > > > work
> > > > >>>> is
> > > > >>>> >>>>>>>> orthogonal
> > > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing
> views,
> > > > >>>> lattice
> > > > >>>> >>> etc.
> > > > >>>> >>>>>>> We
> > > > >>>> >>>>>>>> do
> > > > >>>> >>>>>>>>> feel that the multi-query optimization component can
> be
> > > > >>>> adopted to
> > > > >>>> >>>>>>> wider
> > > > >>>> >>>>>>>>> use, but probably need more suggestions from the
> > > community.
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> Lastly, our current implementation is set up in java
> code,
> > > > it
> > > > >>>> >>> should
> > > > >>>> >>>>>>> be
> > > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> Thanks,
> > > > >>>> >>>>>>>>> Botong
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > > > >>>> >>> jhyde.apache@gmail.com>
> > > > >>>> >>>>>>>>> wrote:
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>>> Botong,
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> This is very exciting; congratulations on this
> research,
> > > > and
> > > > >>>> thank
> > > > >>>> >>>>>>> you
> > > > >>>> >>>>>>>>>> for contributing it back to Calcite.
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> The research touches several areas in Calcite:
> streaming,
> > > > >>>> >>>>>>> materialized
> > > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we
> > > have
> > > > >>>> already
> > > > >>>> >>>>>>> some
> > > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
> > > > >>>> operators,
> > > > >>>> >>>>>>> lattice,
> > > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see
> > > whether
> > > > >>>> we can
> > > > >>>> >>>>>>> make
> > > > >>>> >>>>>>>> them
> > > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume
> others.
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> Your work differs from streaming queries in that your
> > > > >>>> relations
> > > > >>>> >>> are
> > > > >>>> >>>>>>> used
> > > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
> > > > >>>> queries, the
> > > > >>>> >>>>>>> only
> > > > >>>> >>>>>>>>>> activity is the change propagation. Did you find
> that you
> > > > >>>> needed
> > > > >>>> >>> two
> > > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
> > > > >>>> another for
> > > > >>>> >>>>>>> “user
> > > > >>>> >>>>>>>>>> queries” - since the objectives of each activity are
> so
> > > > >>>> different?
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> > > > >>>> >>> multi-objective
> > > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> I will make time over the next few days to read and
> > > digest
> > > > >>>> your
> > > > >>>> >>>>>>> paper.
> > > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth
> process
> > > to
> > > > >>>> create
> > > > >>>> >>>>>>>>>> something that will be useful for the broader
> community.
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> One thing will be particularly useful: making this
> > > > >>>> functionality
> > > > >>>> >>>>>>>>>> available from a SQL shell, so that people can
> experiment
> > > > >>>> with
> > > > >>>> >>> this
> > > > >>>> >>>>>>>>>> functionality without writing Java code or setting up
> > > > complex
> > > > >>>> >>>>>>> databases
> > > > >>>> >>>>>>>> and
> > > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple
> DDL
> > > > >>>> operations
> > > > >>>> >>>>>>> that
> > > > >>>> >>>>>>>> are
> > > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder
> whether
> > > we
> > > > >>>> could
> > > > >>>> >>>>>>> devise
> > > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> Julian
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> [1]
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>
> > > > >>>> >>>
> > > > >>>>
> > > >
> > >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > > > pkuhbt@gmail.com
> > > > >>>> >
> > > > >>>> >>>>>>> wrote:
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the
> figure,
> > > > please
> > > > >>>> >>> refer
> > > > >>>> >>>>>>> to
> > > > >>>> >>>>>>>>>> Fig
> > > > >>>> >>>>>>>>>>> 3(a) in our paper:
> > > > >>>> >>>>>>>>>>
> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> Best,
> > > > >>>> >>>>>>>>>>> Botong
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > > > >>>> taojiatao@gmail.com>
> > > > >>>> >>>>>>>> wrote:
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the
> mail,
> > > > >>>> may you
> > > > >>>> >>>>>>> open
> > > > >>>> >>>>>>>> a
> > > > >>>> >>>>>>>>>> JIRA
> > > > >>>> >>>>>>>>>>>> for this, people who are interested in this can
> > > subscribe
> > > > >>>> to the
> > > > >>>> >>>>>>>> JIRA?
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> Regards!
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> Aron Tao
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四
> > > > 上午3:18写道:
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> Hi all,
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer
> > > into
> > > > a
> > > > >>>> >>> general
> > > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research
> > > paper
> > > > >>>> >>>>>>> published
> > > > >>>> >>>>>>>> in
> > > > >>>> >>>>>>>>>>>> VLDB
> > > > >>>> >>>>>>>>>>>>> 2021:
> > > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework
> for
> > > > >>>> >>> incremental
> > > > >>>> >>>>>>>> data
> > > > >>>> >>>>>>>>>>>>> processing
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating
> how
> > > > >>>> Alibaba’s
> > > > >>>> >>>>>>> data
> > > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental
> query
> > > > >>>> optimizer
> > > > >>>> >>> to
> > > > >>>> >>>>>>>>>>>> alleviate
> > > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> > > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting
> Resource-Aware
> > > > >>>> >>> Incremental
> > > > >>>> >>>>>>>>>>>> Computing
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> > > > >>>> cost-based
> > > > >>>> >>>>>>>>>> incremental
> > > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across
> multiple
> > > > >>>> families
> > > > >>>> >>> of
> > > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
> > > Streaming,
> > > > >>>> >>>>>>> DBToaster,
> > > > >>>> >>>>>>>>>> etc.
> > > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the
> generated
> > > best
> > > > >>>> plan
> > > > >>>> >>> is
> > > > >>>> >>>>>>>>>>>>> consistently much better than the plans from each
> > > > >>>> individual
> > > > >>>> >>>>>>> method
> > > > >>>> >>>>>>>>>>>> alone.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> In general, incremental query planning is central
> to
> > > > >>>> database
> > > > >>>> >>>>>>> view
> > > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are
> > > being
> > > > >>>> >>> adopted
> > > > >>>> >>>>>>> in
> > > > >>>> >>>>>>>>>>>> active
> > > > >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate
> > > query
> > > > >>>> >>>>>>> processing,
> > > > >>>> >>>>>>>>>> etc.
> > > > >>>> >>>>>>>>>>>> We
> > > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
> > > > >>>> spectrum of
> > > > >>>> >>>>>>>>>> Calcite,
> > > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical
> details.
> > > > >>>> Please
> > > > >>>> >>>>>>> refer
> > > > >>>> >>>>>>>> to
> > > > >>>> >>>>>>>>>>>> the
> > > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also
> working
> > > on a
> > > > >>>> >>> journal
> > > > >>>> >>>>>>>>>> version
> > > > >>>> >>>>>>>>>>>> of
> > > > >>>> >>>>>>>>>>>>> the paper with more implementation details.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is
> meant
> > > > to
> > > > >>>> be
> > > > >>>> >>>>>>>> executed
> > > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s
> memo
> > > will
> > > > >>>> be
> > > > >>>> >>>>>>> extended
> > > > >>>> >>>>>>>>>> with
> > > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
> > > generating
> > > > >>>> >>>>>>> incremental
> > > > >>>> >>>>>>>>>>>> plans
> > > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
> > > different
> > > > >>>> time
> > > > >>>> >>>>>>> points.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that
> > > changes
> > > > >>>> over
> > > > >>>> >>> time
> > > > >>>> >>>>>>>>>> (Time
> > > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> > > introduced
> > > > >>>> >>>>>>> TvrMetaSet
> > > > >>>> >>>>>>>>>> into
> > > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to
> track
> > > > >>>> related
> > > > >>>> >>>>>>> RelSets
> > > > >>>> >>>>>>>>>> of a
> > > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at
> certain
> > > > >>>> time,
> > > > >>>> >>>>>>> delta of
> > > > >>>> >>>>>>>>>> the
> > > > >>>> >>>>>>>>>>>>> table between two time points, etc.).
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> [image: image.png]
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical
> line
> > > is a
> > > > >>>> >>>>>>> TvrMetaSet
> > > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R,
> etc.).
> > > > >>>> >>> Horizontal
> > > > >>>> >>>>>>>> lines
> > > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a
> > > RelSet.
> > > > >>>> Users
> > > > >>>> >>> can
> > > > >>>> >>>>>>>>>> write
> > > > >>>> >>>>>>>>>>>> TVR
> > > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations
> > > between
> > > > >>>> these
> > > > >>>> >>>>>>> dots.
> > > > >>>> >>>>>>>>>> For
> > > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
> > > > >>>> describe how
> > > > >>>> >>> to
> > > > >>>> >>>>>>>>>> compute
> > > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other
> TVRs.
> > > The
> > > > >>>> red
> > > > >>>> >>> lines
> > > > >>>> >>>>>>>> are
> > > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations
> within a
> > > > >>>> TVR. All
> > > > >>>> >>>>>>> TVR
> > > > >>>> >>>>>>>>>>>> rewrite
> > > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite
> rules
> > > > still
> > > > >>>> work
> > > > >>>> >>> in
> > > > >>>> >>>>>>>> the
> > > > >>>> >>>>>>>>>> new
> > > > >>>> >>>>>>>>>>>>> volcano system without modification.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four
> > > parts:
> > > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching
> TvrMetaSet
> > > > and
> > > > >>>> >>>>>>> RelNodes,
> > > > >>>> >>>>>>>>>> as
> > > > >>>> >>>>>>>>>>>>> well as links in between the nodes.
> > > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the
> upgraded
> > > > >>>> rule
> > > > >>>> >>>>>>> engine
> > > > >>>> >>>>>>>>>> API.
> > > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> > > > >>>> incremental
> > > > >>>> >>>>>>> plan
> > > > >>>> >>>>>>>>>>>>> involving multiple time points.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature
> and
> > > > thus
> > > > >>>> when
> > > > >>>> >>>>>>>>>> disabled,
> > > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied
> > > this
> > > > >>>> >>>>>>>>>> Calcite-extended
> > > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic
> > > query
> > > > >>>> called
> > > > >>>> >>>>>>> the
> > > > >>>> >>>>>>>>>>>> ‘‘range
> > > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved
> cost
> > > > >>>> savings
> > > > >>>> >>> of
> > > > >>>> >>>>>>> 80%
> > > > >>>> >>>>>>>>>> on
> > > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> > > end-to-end
> > > > >>>> >>> execution
> > > > >>>> >>>>>>>>>> time.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks
> and
> > > > happy
> > > > >>>> >>>>>>> holidays!
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> Best,
> > > > >>>> >>>>>>>>>>>>> Botong
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>> --
> > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~
> > > > >>>> >>>>>>> no mistakes
> > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>
> > > > >>>> >>>
> > > > >>>> >>
> > > > >>>>
> > > > >>>>
> > > >
> > >
> > >
> > > --
> > > Viliam Durina
> > > Jet Developer
> > >       hazelcast®
> > >
> > >   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA
> 94402 |
> > > USA
> > > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
> > >
> > > --
> > > This message contains confidential information and is intended only for
> > > the
> > > individuals named. If you are not the named addressee you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately by e-mail if you have received this e-mail by mistake and
> > > delete this e-mail from your system. E-mail transmission cannot be
> > > guaranteed to be secure or error-free as information could be
> intercepted,
> > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> viruses.
> > > The sender therefore does not accept liability for any errors or
> omissions
> > > in the contents of this message, which arise as a result of e-mail
> > > transmission. If verification is required, please request a hard-copy
> > > version. -Hazelcast
> > >
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Julian Hyde <jh...@apache.org>.
I have added my time preferences to the doc [1]. I am generally
available any evening Mon - Thu. How about we meet Monday 10th May?

Stamatis, Jesus, Given the complexity of this work, I would very much
appreciate your insight, as experts in optimizer theory. Could one of
you join the next meeting? Of course we should choose a time that
works for everyone's schedule.

Julian

[1] https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing

On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pk...@gmail.com> wrote:
>
> We didn't record it, we will try to record the following meetings. Please
> add your time preference in the docs, so that we can find a meeting time
> that works for more people.
>
> Thanks,
> Botong
>
> On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vi...@hazelcast.com> wrote:
>
> > Is there a recording available?
> > Viliam
> >
> > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > The meeting yesterday was fun and productive. As discussed, this is the
> > > call to schedule our second meeting.
> > >
> > > We encourage everyone to add their time preferences during 05/01 - 05/15
> > > here:
> > >
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >
> > > Thanks,
> > > Botong
> > >
> > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > > We've created a zoom meeting below for our meeting next Monday
> > > > (9pm-10:30pm PST on 04/26).
> > > > Talk to you all soon!
> > > >
> > > > Join Zoom Meeting
> > > > https://uci.zoom.us/j/91279732686
> > > > <
> > >
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > > >
> > > >
> > > > Meeting ID: 912 7973 2686
> > > > One tap mobile
> > > > +16699006833,,91279732686# US (San Jose)
> > > > +12532158782,,91279732686# US (Tacoma)
> > > >
> > > > Dial by your location
> > > > +1 669 900 6833 US (San Jose)
> > > > +1 253 215 8782 US (Tacoma)
> > > > +1 346 248 7799 US (Houston)
> > > > +1 301 715 8592 US (Washington DC)
> > > > +1 312 626 6799 US (Chicago)
> > > > +1 646 558 8656 US (New York)
> > > > Meeting ID: 912 7973 2686
> > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > > <
> > >
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > > >
> > > >
> > > > Join by Skype for Business
> > > > https://uci.zoom.us/skype/91279732686
> > > > <
> > >
> > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Botong
> > > >
> > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pk...@gmail.com>
> > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> According to the preferences collected, we are tentatively scheduling
> > > our
> > > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > > >>
> > > >> We will give a presentation about Tempura, followed by a free
> > > discussion.
> > > >>
> > > >> Please let us know if there are new other requests. Few days before
> > > >> the meeting, I will send out a zoom meeting link.
> > > >>
> > > >> Thanks,
> > > >> Botong
> > > >>
> > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com> wrote:
> > > >>
> > > >>> Hi Julian and all,
> > > >>>
> > > >>> We've posted the Tempura code base below. Feel free to take a quick
> > > peek
> > > >>> at the last five commits.
> > > >>>
> > > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > > >>>
> > > >>> I've also opened a Jira (CALCITE-4568
> > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will
> > > serve
> > > >>> as the umbrella Jira for the feature.
> > > >>>
> > > >>> In the meantime, we encourage everyone to enter the time preferences
> > > for
> > > >>> our first meeting here:
> > > >>>
> > > >>>
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>
> > > >>> Thanks,
> > > >>> Botong
> > > >>>
> > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jh...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> I have added my time preferences to the doc.
> > > >>>>
> > > >>>> Before we meet, could you publish a PR for us to review?
> > > >>>>
> > > >>>> Initial discussions will need to be about architecture and
> > high-level
> > > >>>> design. So I would ask Calcite reviewers not to review the PR
> > > line-by-line
> > > >>>> (or to leave comments in GitHub) but try to understand the design
> > > >>>> holistically, and prepare questions/comments before the meeting.
> > > >>>>
> > > >>>> Botong, Can you please create a Calcite JIRA case for this task?
> > JIRA
> > > >>>> how we track long-running tasks such as this.
> > > >>>>
> > > >>>> Julian
> > > >>>>
> > > >>>>
> > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com>
> > wrote:
> > > >>>> >
> > > >>>> > Hi all,
> > > >>>> >
> > > >>>> > Apology for the delay. It took us some time to clean up our code
> > > base
> > > >>>> and
> > > >>>> > publicly release it (which will be out soon) for a quick peek.
> > > >>>> >
> > > >>>> > We are ready to present our work. Let's schedule a time for a Zoom
> > > >>>> > meeting and discuss how to integrate Tempura into Calcite.
> > > >>>> >
> > > >>>> > Since some of our team members are in China, we prefer the time
> > slot
> > > >>>> of
> > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
> > > >>>> shared
> > > >>>> > doc below.
> > > >>>> >
> > > >>>>
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>> >
> > > >>>> > We encourage everyone to add their time preferences (during
> > > >>>> 04/15-04/30) in
> > > >>>> > this doc. In a week or so, we will try to settle a time that works
> > > for
> > > >>>> > most.
> > > >>>> >
> > > >>>> > Thanks,
> > > >>>> > Botong
> > > >>>> >
> > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com>
> > > >>>> wrote:
> > > >>>> >
> > > >>>> >> Hi Julian and Rui,
> > > >>>> >>
> > > >>>> >> Sounds good to us. Please give us some time to prepare some
> > slides
> > > >>>> for the
> > > >>>> >> meeting.
> > > >>>> >>
> > > >>>> >> I've created a doc below for discussion. Please feel free to add
> > > >>>> more in
> > > >>>> >> here:
> > > >>>> >>
> > > >>>> >>
> > > >>>>
> > >
> > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > >>>> >>
> > > >>>> >> Thanks,
> > > >>>> >> Botong
> > > >>>> >>
> > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > > jhyde.apache@gmail.com
> > > >>>> >
> > > >>>> >> wrote:
> > > >>>> >>
> > > >>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
> > > >>>> think we
> > > >>>> >>> should create it to continue discussion after the first meeting.
> > > >>>> >>>
> > > >>>> >>> Julian
> > > >>>> >>>
> > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > > jhyde.apache@gmail.com>
> > > >>>> >>> wrote:
> > > >>>> >>>>
> > > >>>> >>>> I think good next steps would be a PR and a meeting. The PR
> > will
> > > >>>> allow
> > > >>>> >>> us to read the code, but I think we should do the first round of
> > > >>>> questions
> > > >>>> >>> at the meeting.  The meeting could perhaps start with a
> > > >>>> presentation of the
> > > >>>> >>> paper (do you have some slides you are planning to present at
> > > VLDB,
> > > >>>> >>> Botong?) and then move on to questions about the concepts, which
> > > >>>> >>> alternatives were considered, and how the concepts map onto
> > other
> > > >>>> current
> > > >>>> >>> and future concepts in calcite.
> > > >>>> >>>>
> > > >>>> >>>> I don’t think we should start “reviewing” the PR line-by-line
> > at
> > > >>>> this
> > > >>>> >>> point. We need to understand the high-level concepts and design
> > > >>>> choices. If
> > > >>>> >>> we start reviewing the PR we will get lost in the details.
> > > >>>> >>>>
> > > >>>> >>>> I know that integrating a major change is hard; I doubt that we
> > > >>>> will be
> > > >>>> >>> able to integrate everything, but we can build understanding
> > about
> > > >>>> where
> > > >>>> >>> calcite needs to go, and I hope integrate a good amount of code
> > to
> > > >>>> help us
> > > >>>> >>> get there.
> > > >>>> >>>>
> > > >>>> >>>> As I said before, after the integration I would like people to
> > be
> > > >>>> able
> > > >>>> >>> to experiment with it and use it in their production systems.
> > > That
> > > >>>> way, it
> > > >>>> >>> will not be an experiment that withers, but a feature set
> > > >>>> integrates with
> > > >>>> >>> other calcite features and gets stronger over time.
> > > >>>> >>>>
> > > >>>> >>>> Julian
> > > >>>> >>>>
> > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org>
> > > >>>> wrote:
> > > >>>> >>>>>
> > > >>>> >>>>> For me to participate in the discussion for the above
> > > questions,
> > > >>>> I
> > > >>>> >>> will
> > > >>>> >>>>> need to read a lot more to know relevant context and likely
> > ask
> > > >>>> lots of
> > > >>>> >>>>> questions :-).  A editable doc is probably good for questions
> > > and
> > > >>>> back
> > > >>>> >>> and
> > > >>>> >>>>> forward discussion.
> > > >>>> >>>>>
> > > >>>> >>>>>
> > > >>>> >>>>> -Rui
> > > >>>> >>>>>
> > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > > amaliujia@apache.org
> > > >>>> >
> > > >>>> >>> wrote:
> > > >>>> >>>>>>
> > > >>>> >>>>>> I am also happy to help push this work into Calcite (review
> > > code
> > > >>>> and
> > > >>>> >>> doc,
> > > >>>> >>>>>> etc.).
> > > >>>> >>>>>>
> > > >>>> >>>>>> While you can share your code so people can have more idea
> > how
> > > >>>> it is
> > > >>>> >>>>>> implemented, I think it would be also nice to have a doc to
> > > >>>> discuss
> > > >>>> >>> open
> > > >>>> >>>>>> questions above. Some points that I copy those to here:
> > > >>>> >>>>>>
> > > >>>> >>>>>> 1. Can this solution be compatible with existing solutions in
> > > >>>> Calcite
> > > >>>> >>>>>> Streaming, materialized view maintenance, and multi-query
> > > >>>> optimization
> > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
> > > >>>> operator),
> > > >>>> >>>>>> 2. Did you find that you needed two separate cost models -
> > one
> > > >>>> for
> > > >>>> >>> “view
> > > >>>> >>>>>> maintenance” and another for “user queries” - since the
> > > >>>> objectives of
> > > >>>> >>> each
> > > >>>> >>>>>> activity are so different?
> > > >>>> >>>>>> 3. whether this work will hasten the arrival of
> > multi-objective
> > > >>>> >>> parametric
> > > >>>> >>>>>> query optimization [1] in Calcite.
> > > >>>> >>>>>> 4. probably SQL shell support.
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>> [1]:
> > > >>>> >>>>>>
> > > >>>> >>>
> > > >>>>
> > >
> > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>> -Rui
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com>
> > > >>>> wrote:
> > > >>>> >>>>>>>
> > > >>>> >>>>>>> it would be very nice to see a POC of your work.
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > > >>>> pkuhbt@gmail.com>
> > > >>>> >>> wrote:
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>> Hi Julian,
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>> Just wondering if there are any updates? We are wondering
> > if
> > > it
> > > >>>> >>> would
> > > >>>> >>>>>>> help
> > > >>>> >>>>>>>> to post our code for a quick preview.
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>> Thanks,
> > > >>>> >>>>>>>> Botong
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > > pkuhbt@gmail.com
> > > >>>> >
> > > >>>> >>> wrote:
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>>> Hi Julian,
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan
> > that
> > > >>>> best
> > > >>>> >>>>>>> benefits
> > > >>>> >>>>>>>>> the community. Here are some clarifications that hopefully
> > > >>>> answer
> > > >>>> >>> your
> > > >>>> >>>>>>>>> questions.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time
> > points
> > > to
> > > >>>> >>>>>>> consider
> > > >>>> >>>>>>>>> running and a cost function that expresses users'
> > preference
> > > >>>> over
> > > >>>> >>>>>>> time,
> > > >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> > > >>>> minimizes the
> > > >>>> >>>>>>>> overall
> > > >>>> >>>>>>>>> cost function.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
> > > >>>> points
> > > >>>> >>> can
> > > >>>> >>>>>>> be
> > > >>>> >>>>>>>>> different from each other, as opposed to identical plans
> > in
> > > >>>> all
> > > >>>> >>> delta
> > > >>>> >>>>>>>> runs
> > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the
> > Tempura
> > > >>>> paper,
> > > >>>> >>> we
> > > >>>> >>>>>>> can
> > > >>>> >>>>>>>>> mimic the current streaming implementation by specifying
> > two
> > > >>>> >>> (logical)
> > > >>>> >>>>>>>> time
> > > >>>> >>>>>>>>> points in Tempura, representing the initial run and later
> > > >>>> delta
> > > >>>> >>> runs
> > > >>>> >>>>>>>>> respectively. In general, note that Tempura supports
> > various
> > > >>>> form
> > > >>>> >>> of
> > > >>>> >>>>>>>>> incremental computing, not only the small-delta
> > append-only
> > > >>>> data
> > > >>>> >>>>>>> model in
> > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes
> > > the
> > > >>>> >>> current
> > > >>>> >>>>>>>>> streaming support, as well as any IVM implementations.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> About the cost model, we did not come up with a seperate
> > > cost
> > > >>>> >>> model,
> > > >>>> >>>>>>> but
> > > >>>> >>>>>>>>> rather extended the existing one. Similar to
> > multi-objective
> > > >>>> >>>>>>>> optimization,
> > > >>>> >>>>>>>>> costs incurred at different time points are considered
> > > >>>> different
> > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
> > > >>>> converts this
> > > >>>> >>>>>>> cost
> > > >>>> >>>>>>>>> vector into a final cost. So under this function, any two
> > > >>>> >>> incremental
> > > >>>> >>>>>>>> plans
> > > >>>> >>>>>>>>> are still comparable and there is an overall optimum. I
> > > guess
> > > >>>> we
> > > >>>> >>> can
> > > >>>> >>>>>>> go
> > > >>>> >>>>>>>>> down the route of multi-objective parametric query
> > > >>>> optimization
> > > >>>> >>>>>>> instead
> > > >>>> >>>>>>>> if
> > > >>>> >>>>>>>>> there is a need.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Next on materialized views and multi-query optimization,
> > > >>>> since our
> > > >>>> >>>>>>>>> multi-time-point plan naturally involves materializing
> > > >>>> intermediate
> > > >>>> >>>>>>>> results
> > > >>>> >>>>>>>>> for later time points, we need to solve the problem of
> > > >>>> choosing
> > > >>>> >>>>>>>>> materializations and include the cost of saving and
> > reusing
> > > >>>> the
> > > >>>> >>>>>>>>> materializations when costing and comparing plans. We
> > > >>>> borrowed the
> > > >>>> >>>>>>>>> multi-query optimization techniques to solve this problem
> > > even
> > > >>>> >>> though
> > > >>>> >>>>>>> we
> > > >>>> >>>>>>>>> are looking at a single query. As a result, we think our
> > > work
> > > >>>> is
> > > >>>> >>>>>>>> orthogonal
> > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
> > > >>>> lattice
> > > >>>> >>> etc.
> > > >>>> >>>>>>> We
> > > >>>> >>>>>>>> do
> > > >>>> >>>>>>>>> feel that the multi-query optimization component can be
> > > >>>> adopted to
> > > >>>> >>>>>>> wider
> > > >>>> >>>>>>>>> use, but probably need more suggestions from the
> > community.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Lastly, our current implementation is set up in java code,
> > > it
> > > >>>> >>> should
> > > >>>> >>>>>>> be
> > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> Thanks,
> > > >>>> >>>>>>>>> Botong
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > > >>>> >>> jhyde.apache@gmail.com>
> > > >>>> >>>>>>>>> wrote:
> > > >>>> >>>>>>>>>
> > > >>>> >>>>>>>>>> Botong,
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> This is very exciting; congratulations on this research,
> > > and
> > > >>>> thank
> > > >>>> >>>>>>> you
> > > >>>> >>>>>>>>>> for contributing it back to Calcite.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
> > > >>>> >>>>>>> materialized
> > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we
> > have
> > > >>>> already
> > > >>>> >>>>>>> some
> > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
> > > >>>> operators,
> > > >>>> >>>>>>> lattice,
> > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see
> > whether
> > > >>>> we can
> > > >>>> >>>>>>> make
> > > >>>> >>>>>>>> them
> > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> Your work differs from streaming queries in that your
> > > >>>> relations
> > > >>>> >>> are
> > > >>>> >>>>>>> used
> > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
> > > >>>> queries, the
> > > >>>> >>>>>>> only
> > > >>>> >>>>>>>>>> activity is the change propagation. Did you find that you
> > > >>>> needed
> > > >>>> >>> two
> > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
> > > >>>> another for
> > > >>>> >>>>>>> “user
> > > >>>> >>>>>>>>>> queries” - since the objectives of each activity are so
> > > >>>> different?
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> > > >>>> >>> multi-objective
> > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> I will make time over the next few days to read and
> > digest
> > > >>>> your
> > > >>>> >>>>>>> paper.
> > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process
> > to
> > > >>>> create
> > > >>>> >>>>>>>>>> something that will be useful for the broader community.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> One thing will be particularly useful: making this
> > > >>>> functionality
> > > >>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
> > > >>>> with
> > > >>>> >>> this
> > > >>>> >>>>>>>>>> functionality without writing Java code or setting up
> > > complex
> > > >>>> >>>>>>> databases
> > > >>>> >>>>>>>> and
> > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
> > > >>>> operations
> > > >>>> >>>>>>> that
> > > >>>> >>>>>>>> are
> > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether
> > we
> > > >>>> could
> > > >>>> >>>>>>> devise
> > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> Julian
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>> [1]
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>
> > > >>>>
> > >
> > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > > pkuhbt@gmail.com
> > > >>>> >
> > > >>>> >>>>>>> wrote:
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure,
> > > please
> > > >>>> >>> refer
> > > >>>> >>>>>>> to
> > > >>>> >>>>>>>>>> Fig
> > > >>>> >>>>>>>>>>> 3(a) in our paper:
> > > >>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>> Best,
> > > >>>> >>>>>>>>>>> Botong
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > > >>>> taojiatao@gmail.com>
> > > >>>> >>>>>>>> wrote:
> > > >>>> >>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
> > > >>>> may you
> > > >>>> >>>>>>> open
> > > >>>> >>>>>>>> a
> > > >>>> >>>>>>>>>> JIRA
> > > >>>> >>>>>>>>>>>> for this, people who are interested in this can
> > subscribe
> > > >>>> to the
> > > >>>> >>>>>>>> JIRA?
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Regards!
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Aron Tao
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四
> > > 上午3:18写道:
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Hi all,
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer
> > into
> > > a
> > > >>>> >>> general
> > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research
> > paper
> > > >>>> >>>>>>> published
> > > >>>> >>>>>>>> in
> > > >>>> >>>>>>>>>>>> VLDB
> > > >>>> >>>>>>>>>>>>> 2021:
> > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
> > > >>>> >>> incremental
> > > >>>> >>>>>>>> data
> > > >>>> >>>>>>>>>>>>> processing
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
> > > >>>> Alibaba’s
> > > >>>> >>>>>>> data
> > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
> > > >>>> optimizer
> > > >>>> >>> to
> > > >>>> >>>>>>>>>>>> alleviate
> > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
> > > >>>> >>> Incremental
> > > >>>> >>>>>>>>>>>> Computing
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> > > >>>> cost-based
> > > >>>> >>>>>>>>>> incremental
> > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
> > > >>>> families
> > > >>>> >>> of
> > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
> > Streaming,
> > > >>>> >>>>>>> DBToaster,
> > > >>>> >>>>>>>>>> etc.
> > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated
> > best
> > > >>>> plan
> > > >>>> >>> is
> > > >>>> >>>>>>>>>>>>> consistently much better than the plans from each
> > > >>>> individual
> > > >>>> >>>>>>> method
> > > >>>> >>>>>>>>>>>> alone.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> In general, incremental query planning is central to
> > > >>>> database
> > > >>>> >>>>>>> view
> > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are
> > being
> > > >>>> >>> adopted
> > > >>>> >>>>>>> in
> > > >>>> >>>>>>>>>>>> active
> > > >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate
> > query
> > > >>>> >>>>>>> processing,
> > > >>>> >>>>>>>>>> etc.
> > > >>>> >>>>>>>>>>>> We
> > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
> > > >>>> spectrum of
> > > >>>> >>>>>>>>>> Calcite,
> > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
> > > >>>> Please
> > > >>>> >>>>>>> refer
> > > >>>> >>>>>>>> to
> > > >>>> >>>>>>>>>>>> the
> > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working
> > on a
> > > >>>> >>> journal
> > > >>>> >>>>>>>>>> version
> > > >>>> >>>>>>>>>>>> of
> > > >>>> >>>>>>>>>>>>> the paper with more implementation details.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant
> > > to
> > > >>>> be
> > > >>>> >>>>>>>> executed
> > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo
> > will
> > > >>>> be
> > > >>>> >>>>>>> extended
> > > >>>> >>>>>>>>>> with
> > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
> > generating
> > > >>>> >>>>>>> incremental
> > > >>>> >>>>>>>>>>>> plans
> > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
> > different
> > > >>>> time
> > > >>>> >>>>>>> points.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that
> > changes
> > > >>>> over
> > > >>>> >>> time
> > > >>>> >>>>>>>>>> (Time
> > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> > introduced
> > > >>>> >>>>>>> TvrMetaSet
> > > >>>> >>>>>>>>>> into
> > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
> > > >>>> related
> > > >>>> >>>>>>> RelSets
> > > >>>> >>>>>>>>>> of a
> > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
> > > >>>> time,
> > > >>>> >>>>>>> delta of
> > > >>>> >>>>>>>>>> the
> > > >>>> >>>>>>>>>>>>> table between two time points, etc.).
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> [image: image.png]
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical line
> > is a
> > > >>>> >>>>>>> TvrMetaSet
> > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
> > > >>>> >>> Horizontal
> > > >>>> >>>>>>>> lines
> > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a
> > RelSet.
> > > >>>> Users
> > > >>>> >>> can
> > > >>>> >>>>>>>>>> write
> > > >>>> >>>>>>>>>>>> TVR
> > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations
> > between
> > > >>>> these
> > > >>>> >>>>>>> dots.
> > > >>>> >>>>>>>>>> For
> > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
> > > >>>> describe how
> > > >>>> >>> to
> > > >>>> >>>>>>>>>> compute
> > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs.
> > The
> > > >>>> red
> > > >>>> >>> lines
> > > >>>> >>>>>>>> are
> > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
> > > >>>> TVR. All
> > > >>>> >>>>>>> TVR
> > > >>>> >>>>>>>>>>>> rewrite
> > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules
> > > still
> > > >>>> work
> > > >>>> >>> in
> > > >>>> >>>>>>>> the
> > > >>>> >>>>>>>>>> new
> > > >>>> >>>>>>>>>>>>> volcano system without modification.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four
> > parts:
> > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet
> > > and
> > > >>>> >>>>>>> RelNodes,
> > > >>>> >>>>>>>>>> as
> > > >>>> >>>>>>>>>>>>> well as links in between the nodes.
> > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
> > > >>>> rule
> > > >>>> >>>>>>> engine
> > > >>>> >>>>>>>>>> API.
> > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> > > >>>> incremental
> > > >>>> >>>>>>> plan
> > > >>>> >>>>>>>>>>>>> involving multiple time points.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and
> > > thus
> > > >>>> when
> > > >>>> >>>>>>>>>> disabled,
> > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied
> > this
> > > >>>> >>>>>>>>>> Calcite-extended
> > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic
> > query
> > > >>>> called
> > > >>>> >>>>>>> the
> > > >>>> >>>>>>>>>>>> ‘‘range
> > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
> > > >>>> savings
> > > >>>> >>> of
> > > >>>> >>>>>>> 80%
> > > >>>> >>>>>>>>>> on
> > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> > end-to-end
> > > >>>> >>> execution
> > > >>>> >>>>>>>>>> time.
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and
> > > happy
> > > >>>> >>>>>>> holidays!
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>> Best,
> > > >>>> >>>>>>>>>>>>> Botong
> > > >>>> >>>>>>>>>>>>>
> > > >>>> >>>>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>>>
> > > >>>> >>>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>>>>>
> > > >>>> >>>>>>> --
> > > >>>> >>>>>>> ~~~~~~~~~~~~~~~
> > > >>>> >>>>>>> no mistakes
> > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> > > >>>> >>>>>>>
> > > >>>> >>>>>>
> > > >>>> >>>
> > > >>>> >>
> > > >>>>
> > > >>>>
> > >
> >
> >
> > --
> > Viliam Durina
> > Jet Developer
> >       hazelcast®
> >
> >   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
> > USA
> > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
> >
> > --
> > This message contains confidential information and is intended only for
> > the
> > individuals named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> > The sender therefore does not accept liability for any errors or omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required, please request a hard-copy
> > version. -Hazelcast
> >

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
We didn't record it, we will try to record the following meetings. Please
add your time preference in the docs, so that we can find a meeting time
that works for more people.

Thanks,
Botong

On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vi...@hazelcast.com> wrote:

> Is there a recording available?
> Viliam
>
> On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com> wrote:
>
> > Hi all,
> >
> > The meeting yesterday was fun and productive. As discussed, this is the
> > call to schedule our second meeting.
> >
> > We encourage everyone to add their time preferences during 05/01 - 05/15
> > here:
> >
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >
> > Thanks,
> > Botong
> >
> > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com> wrote:
> >
> > > Hi all,
> > > We've created a zoom meeting below for our meeting next Monday
> > > (9pm-10:30pm PST on 04/26).
> > > Talk to you all soon!
> > >
> > > Join Zoom Meeting
> > > https://uci.zoom.us/j/91279732686
> > > <
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > >
> > >
> > > Meeting ID: 912 7973 2686
> > > One tap mobile
> > > +16699006833,,91279732686# US (San Jose)
> > > +12532158782,,91279732686# US (Tacoma)
> > >
> > > Dial by your location
> > > +1 669 900 6833 US (San Jose)
> > > +1 253 215 8782 US (Tacoma)
> > > +1 346 248 7799 US (Houston)
> > > +1 301 715 8592 US (Washington DC)
> > > +1 312 626 6799 US (Chicago)
> > > +1 646 558 8656 US (New York)
> > > Meeting ID: 912 7973 2686
> > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > <
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > >
> > >
> > > Join by Skype for Business
> > > https://uci.zoom.us/skype/91279732686
> > > <
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > >
> > >
> > >
> > > Thanks,
> > > Botong
> > >
> > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pk...@gmail.com>
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> According to the preferences collected, we are tentatively scheduling
> > our
> > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > >>
> > >> We will give a presentation about Tempura, followed by a free
> > discussion.
> > >>
> > >> Please let us know if there are new other requests. Few days before
> > >> the meeting, I will send out a zoom meeting link.
> > >>
> > >> Thanks,
> > >> Botong
> > >>
> > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com> wrote:
> > >>
> > >>> Hi Julian and all,
> > >>>
> > >>> We've posted the Tempura code base below. Feel free to take a quick
> > peek
> > >>> at the last five commits.
> > >>>
> > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > >>>
> > >>> I've also opened a Jira (CALCITE-4568
> > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will
> > serve
> > >>> as the umbrella Jira for the feature.
> > >>>
> > >>> In the meantime, we encourage everyone to enter the time preferences
> > for
> > >>> our first meeting here:
> > >>>
> > >>>
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>
> > >>> Thanks,
> > >>> Botong
> > >>>
> > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jh...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> I have added my time preferences to the doc.
> > >>>>
> > >>>> Before we meet, could you publish a PR for us to review?
> > >>>>
> > >>>> Initial discussions will need to be about architecture and
> high-level
> > >>>> design. So I would ask Calcite reviewers not to review the PR
> > line-by-line
> > >>>> (or to leave comments in GitHub) but try to understand the design
> > >>>> holistically, and prepare questions/comments before the meeting.
> > >>>>
> > >>>> Botong, Can you please create a Calcite JIRA case for this task?
> JIRA
> > >>>> how we track long-running tasks such as this.
> > >>>>
> > >>>> Julian
> > >>>>
> > >>>>
> > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com>
> wrote:
> > >>>> >
> > >>>> > Hi all,
> > >>>> >
> > >>>> > Apology for the delay. It took us some time to clean up our code
> > base
> > >>>> and
> > >>>> > publicly release it (which will be out soon) for a quick peek.
> > >>>> >
> > >>>> > We are ready to present our work. Let's schedule a time for a Zoom
> > >>>> > meeting and discuss how to integrate Tempura into Calcite.
> > >>>> >
> > >>>> > Since some of our team members are in China, we prefer the time
> slot
> > >>>> of
> > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
> > >>>> shared
> > >>>> > doc below.
> > >>>> >
> > >>>>
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>> >
> > >>>> > We encourage everyone to add their time preferences (during
> > >>>> 04/15-04/30) in
> > >>>> > this doc. In a week or so, we will try to settle a time that works
> > for
> > >>>> > most.
> > >>>> >
> > >>>> > Thanks,
> > >>>> > Botong
> > >>>> >
> > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com>
> > >>>> wrote:
> > >>>> >
> > >>>> >> Hi Julian and Rui,
> > >>>> >>
> > >>>> >> Sounds good to us. Please give us some time to prepare some
> slides
> > >>>> for the
> > >>>> >> meeting.
> > >>>> >>
> > >>>> >> I've created a doc below for discussion. Please feel free to add
> > >>>> more in
> > >>>> >> here:
> > >>>> >>
> > >>>> >>
> > >>>>
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > >>>> >>
> > >>>> >> Thanks,
> > >>>> >> Botong
> > >>>> >>
> > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> > jhyde.apache@gmail.com
> > >>>> >
> > >>>> >> wrote:
> > >>>> >>
> > >>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
> > >>>> think we
> > >>>> >>> should create it to continue discussion after the first meeting.
> > >>>> >>>
> > >>>> >>> Julian
> > >>>> >>>
> > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> > jhyde.apache@gmail.com>
> > >>>> >>> wrote:
> > >>>> >>>>
> > >>>> >>>> I think good next steps would be a PR and a meeting. The PR
> will
> > >>>> allow
> > >>>> >>> us to read the code, but I think we should do the first round of
> > >>>> questions
> > >>>> >>> at the meeting.  The meeting could perhaps start with a
> > >>>> presentation of the
> > >>>> >>> paper (do you have some slides you are planning to present at
> > VLDB,
> > >>>> >>> Botong?) and then move on to questions about the concepts, which
> > >>>> >>> alternatives were considered, and how the concepts map onto
> other
> > >>>> current
> > >>>> >>> and future concepts in calcite.
> > >>>> >>>>
> > >>>> >>>> I don’t think we should start “reviewing” the PR line-by-line
> at
> > >>>> this
> > >>>> >>> point. We need to understand the high-level concepts and design
> > >>>> choices. If
> > >>>> >>> we start reviewing the PR we will get lost in the details.
> > >>>> >>>>
> > >>>> >>>> I know that integrating a major change is hard; I doubt that we
> > >>>> will be
> > >>>> >>> able to integrate everything, but we can build understanding
> about
> > >>>> where
> > >>>> >>> calcite needs to go, and I hope integrate a good amount of code
> to
> > >>>> help us
> > >>>> >>> get there.
> > >>>> >>>>
> > >>>> >>>> As I said before, after the integration I would like people to
> be
> > >>>> able
> > >>>> >>> to experiment with it and use it in their production systems.
> > That
> > >>>> way, it
> > >>>> >>> will not be an experiment that withers, but a feature set
> > >>>> integrates with
> > >>>> >>> other calcite features and gets stronger over time.
> > >>>> >>>>
> > >>>> >>>> Julian
> > >>>> >>>>
> > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org>
> > >>>> wrote:
> > >>>> >>>>>
> > >>>> >>>>> For me to participate in the discussion for the above
> > questions,
> > >>>> I
> > >>>> >>> will
> > >>>> >>>>> need to read a lot more to know relevant context and likely
> ask
> > >>>> lots of
> > >>>> >>>>> questions :-).  A editable doc is probably good for questions
> > and
> > >>>> back
> > >>>> >>> and
> > >>>> >>>>> forward discussion.
> > >>>> >>>>>
> > >>>> >>>>>
> > >>>> >>>>> -Rui
> > >>>> >>>>>
> > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> > amaliujia@apache.org
> > >>>> >
> > >>>> >>> wrote:
> > >>>> >>>>>>
> > >>>> >>>>>> I am also happy to help push this work into Calcite (review
> > code
> > >>>> and
> > >>>> >>> doc,
> > >>>> >>>>>> etc.).
> > >>>> >>>>>>
> > >>>> >>>>>> While you can share your code so people can have more idea
> how
> > >>>> it is
> > >>>> >>>>>> implemented, I think it would be also nice to have a doc to
> > >>>> discuss
> > >>>> >>> open
> > >>>> >>>>>> questions above. Some points that I copy those to here:
> > >>>> >>>>>>
> > >>>> >>>>>> 1. Can this solution be compatible with existing solutions in
> > >>>> Calcite
> > >>>> >>>>>> Streaming, materialized view maintenance, and multi-query
> > >>>> optimization
> > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
> > >>>> operator),
> > >>>> >>>>>> 2. Did you find that you needed two separate cost models -
> one
> > >>>> for
> > >>>> >>> “view
> > >>>> >>>>>> maintenance” and another for “user queries” - since the
> > >>>> objectives of
> > >>>> >>> each
> > >>>> >>>>>> activity are so different?
> > >>>> >>>>>> 3. whether this work will hasten the arrival of
> multi-objective
> > >>>> >>> parametric
> > >>>> >>>>>> query optimization [1] in Calcite.
> > >>>> >>>>>> 4. probably SQL shell support.
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>> [1]:
> > >>>> >>>>>>
> > >>>> >>>
> > >>>>
> >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>> -Rui
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>>
> > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com>
> > >>>> wrote:
> > >>>> >>>>>>>
> > >>>> >>>>>>> it would be very nice to see a POC of your work.
> > >>>> >>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> > >>>> pkuhbt@gmail.com>
> > >>>> >>> wrote:
> > >>>> >>>>>>>
> > >>>> >>>>>>>> Hi Julian,
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> Just wondering if there are any updates? We are wondering
> if
> > it
> > >>>> >>> would
> > >>>> >>>>>>> help
> > >>>> >>>>>>>> to post our code for a quick preview.
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> Thanks,
> > >>>> >>>>>>>> Botong
> > >>>> >>>>>>>>
> > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> > pkuhbt@gmail.com
> > >>>> >
> > >>>> >>> wrote:
> > >>>> >>>>>>>>
> > >>>> >>>>>>>>> Hi Julian,
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan
> that
> > >>>> best
> > >>>> >>>>>>> benefits
> > >>>> >>>>>>>>> the community. Here are some clarifications that hopefully
> > >>>> answer
> > >>>> >>> your
> > >>>> >>>>>>>>> questions.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time
> points
> > to
> > >>>> >>>>>>> consider
> > >>>> >>>>>>>>> running and a cost function that expresses users'
> preference
> > >>>> over
> > >>>> >>>>>>> time,
> > >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> > >>>> minimizes the
> > >>>> >>>>>>>> overall
> > >>>> >>>>>>>>> cost function.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
> > >>>> points
> > >>>> >>> can
> > >>>> >>>>>>> be
> > >>>> >>>>>>>>> different from each other, as opposed to identical plans
> in
> > >>>> all
> > >>>> >>> delta
> > >>>> >>>>>>>> runs
> > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the
> Tempura
> > >>>> paper,
> > >>>> >>> we
> > >>>> >>>>>>> can
> > >>>> >>>>>>>>> mimic the current streaming implementation by specifying
> two
> > >>>> >>> (logical)
> > >>>> >>>>>>>> time
> > >>>> >>>>>>>>> points in Tempura, representing the initial run and later
> > >>>> delta
> > >>>> >>> runs
> > >>>> >>>>>>>>> respectively. In general, note that Tempura supports
> various
> > >>>> form
> > >>>> >>> of
> > >>>> >>>>>>>>> incremental computing, not only the small-delta
> append-only
> > >>>> data
> > >>>> >>>>>>> model in
> > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes
> > the
> > >>>> >>> current
> > >>>> >>>>>>>>> streaming support, as well as any IVM implementations.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> About the cost model, we did not come up with a seperate
> > cost
> > >>>> >>> model,
> > >>>> >>>>>>> but
> > >>>> >>>>>>>>> rather extended the existing one. Similar to
> multi-objective
> > >>>> >>>>>>>> optimization,
> > >>>> >>>>>>>>> costs incurred at different time points are considered
> > >>>> different
> > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
> > >>>> converts this
> > >>>> >>>>>>> cost
> > >>>> >>>>>>>>> vector into a final cost. So under this function, any two
> > >>>> >>> incremental
> > >>>> >>>>>>>> plans
> > >>>> >>>>>>>>> are still comparable and there is an overall optimum. I
> > guess
> > >>>> we
> > >>>> >>> can
> > >>>> >>>>>>> go
> > >>>> >>>>>>>>> down the route of multi-objective parametric query
> > >>>> optimization
> > >>>> >>>>>>> instead
> > >>>> >>>>>>>> if
> > >>>> >>>>>>>>> there is a need.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Next on materialized views and multi-query optimization,
> > >>>> since our
> > >>>> >>>>>>>>> multi-time-point plan naturally involves materializing
> > >>>> intermediate
> > >>>> >>>>>>>> results
> > >>>> >>>>>>>>> for later time points, we need to solve the problem of
> > >>>> choosing
> > >>>> >>>>>>>>> materializations and include the cost of saving and
> reusing
> > >>>> the
> > >>>> >>>>>>>>> materializations when costing and comparing plans. We
> > >>>> borrowed the
> > >>>> >>>>>>>>> multi-query optimization techniques to solve this problem
> > even
> > >>>> >>> though
> > >>>> >>>>>>> we
> > >>>> >>>>>>>>> are looking at a single query. As a result, we think our
> > work
> > >>>> is
> > >>>> >>>>>>>> orthogonal
> > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
> > >>>> lattice
> > >>>> >>> etc.
> > >>>> >>>>>>> We
> > >>>> >>>>>>>> do
> > >>>> >>>>>>>>> feel that the multi-query optimization component can be
> > >>>> adopted to
> > >>>> >>>>>>> wider
> > >>>> >>>>>>>>> use, but probably need more suggestions from the
> community.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Lastly, our current implementation is set up in java code,
> > it
> > >>>> >>> should
> > >>>> >>>>>>> be
> > >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> Thanks,
> > >>>> >>>>>>>>> Botong
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> > >>>> >>> jhyde.apache@gmail.com>
> > >>>> >>>>>>>>> wrote:
> > >>>> >>>>>>>>>
> > >>>> >>>>>>>>>> Botong,
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> This is very exciting; congratulations on this research,
> > and
> > >>>> thank
> > >>>> >>>>>>> you
> > >>>> >>>>>>>>>> for contributing it back to Calcite.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
> > >>>> >>>>>>> materialized
> > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we
> have
> > >>>> already
> > >>>> >>>>>>> some
> > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
> > >>>> operators,
> > >>>> >>>>>>> lattice,
> > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see
> whether
> > >>>> we can
> > >>>> >>>>>>> make
> > >>>> >>>>>>>> them
> > >>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> Your work differs from streaming queries in that your
> > >>>> relations
> > >>>> >>> are
> > >>>> >>>>>>> used
> > >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
> > >>>> queries, the
> > >>>> >>>>>>> only
> > >>>> >>>>>>>>>> activity is the change propagation. Did you find that you
> > >>>> needed
> > >>>> >>> two
> > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
> > >>>> another for
> > >>>> >>>>>>> “user
> > >>>> >>>>>>>>>> queries” - since the objectives of each activity are so
> > >>>> different?
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> > >>>> >>> multi-objective
> > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> I will make time over the next few days to read and
> digest
> > >>>> your
> > >>>> >>>>>>> paper.
> > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process
> to
> > >>>> create
> > >>>> >>>>>>>>>> something that will be useful for the broader community.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> One thing will be particularly useful: making this
> > >>>> functionality
> > >>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
> > >>>> with
> > >>>> >>> this
> > >>>> >>>>>>>>>> functionality without writing Java code or setting up
> > complex
> > >>>> >>>>>>> databases
> > >>>> >>>>>>>> and
> > >>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
> > >>>> operations
> > >>>> >>>>>>> that
> > >>>> >>>>>>>> are
> > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether
> we
> > >>>> could
> > >>>> >>>>>>> devise
> > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> Julian
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>> [1]
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>
> > >>>>
> >
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> > pkuhbt@gmail.com
> > >>>> >
> > >>>> >>>>>>> wrote:
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure,
> > please
> > >>>> >>> refer
> > >>>> >>>>>>> to
> > >>>> >>>>>>>>>> Fig
> > >>>> >>>>>>>>>>> 3(a) in our paper:
> > >>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>> Botong
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> > >>>> taojiatao@gmail.com>
> > >>>> >>>>>>>> wrote:
> > >>>> >>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
> > >>>> may you
> > >>>> >>>>>>> open
> > >>>> >>>>>>>> a
> > >>>> >>>>>>>>>> JIRA
> > >>>> >>>>>>>>>>>> for this, people who are interested in this can
> subscribe
> > >>>> to the
> > >>>> >>>>>>>> JIRA?
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Regards!
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Aron Tao
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四
> > 上午3:18写道:
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Hi all,
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer
> into
> > a
> > >>>> >>> general
> > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research
> paper
> > >>>> >>>>>>> published
> > >>>> >>>>>>>> in
> > >>>> >>>>>>>>>>>> VLDB
> > >>>> >>>>>>>>>>>>> 2021:
> > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
> > >>>> >>> incremental
> > >>>> >>>>>>>> data
> > >>>> >>>>>>>>>>>>> processing
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
> > >>>> Alibaba’s
> > >>>> >>>>>>> data
> > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
> > >>>> optimizer
> > >>>> >>> to
> > >>>> >>>>>>>>>>>> alleviate
> > >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
> > >>>> >>> Incremental
> > >>>> >>>>>>>>>>>> Computing
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> > >>>> cost-based
> > >>>> >>>>>>>>>> incremental
> > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
> > >>>> families
> > >>>> >>> of
> > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM,
> Streaming,
> > >>>> >>>>>>> DBToaster,
> > >>>> >>>>>>>>>> etc.
> > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated
> best
> > >>>> plan
> > >>>> >>> is
> > >>>> >>>>>>>>>>>>> consistently much better than the plans from each
> > >>>> individual
> > >>>> >>>>>>> method
> > >>>> >>>>>>>>>>>> alone.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> In general, incremental query planning is central to
> > >>>> database
> > >>>> >>>>>>> view
> > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are
> being
> > >>>> >>> adopted
> > >>>> >>>>>>> in
> > >>>> >>>>>>>>>>>> active
> > >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate
> query
> > >>>> >>>>>>> processing,
> > >>>> >>>>>>>>>> etc.
> > >>>> >>>>>>>>>>>> We
> > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
> > >>>> spectrum of
> > >>>> >>>>>>>>>> Calcite,
> > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
> > >>>> Please
> > >>>> >>>>>>> refer
> > >>>> >>>>>>>> to
> > >>>> >>>>>>>>>>>> the
> > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working
> on a
> > >>>> >>> journal
> > >>>> >>>>>>>>>> version
> > >>>> >>>>>>>>>>>> of
> > >>>> >>>>>>>>>>>>> the paper with more implementation details.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant
> > to
> > >>>> be
> > >>>> >>>>>>>> executed
> > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo
> will
> > >>>> be
> > >>>> >>>>>>> extended
> > >>>> >>>>>>>>>> with
> > >>>> >>>>>>>>>>>>> temporal information so that it is capable of
> generating
> > >>>> >>>>>>> incremental
> > >>>> >>>>>>>>>>>> plans
> > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at
> different
> > >>>> time
> > >>>> >>>>>>> points.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that
> changes
> > >>>> over
> > >>>> >>> time
> > >>>> >>>>>>>>>> (Time
> > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we
> introduced
> > >>>> >>>>>>> TvrMetaSet
> > >>>> >>>>>>>>>> into
> > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
> > >>>> related
> > >>>> >>>>>>> RelSets
> > >>>> >>>>>>>>>> of a
> > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
> > >>>> time,
> > >>>> >>>>>>> delta of
> > >>>> >>>>>>>>>> the
> > >>>> >>>>>>>>>>>>> table between two time points, etc.).
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> [image: image.png]
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical line
> is a
> > >>>> >>>>>>> TvrMetaSet
> > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
> > >>>> >>> Horizontal
> > >>>> >>>>>>>> lines
> > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a
> RelSet.
> > >>>> Users
> > >>>> >>> can
> > >>>> >>>>>>>>>> write
> > >>>> >>>>>>>>>>>> TVR
> > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations
> between
> > >>>> these
> > >>>> >>>>>>> dots.
> > >>>> >>>>>>>>>> For
> > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
> > >>>> describe how
> > >>>> >>> to
> > >>>> >>>>>>>>>> compute
> > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs.
> The
> > >>>> red
> > >>>> >>> lines
> > >>>> >>>>>>>> are
> > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
> > >>>> TVR. All
> > >>>> >>>>>>> TVR
> > >>>> >>>>>>>>>>>> rewrite
> > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules
> > still
> > >>>> work
> > >>>> >>> in
> > >>>> >>>>>>>> the
> > >>>> >>>>>>>>>> new
> > >>>> >>>>>>>>>>>>> volcano system without modification.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four
> parts:
> > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet
> > and
> > >>>> >>>>>>> RelNodes,
> > >>>> >>>>>>>>>> as
> > >>>> >>>>>>>>>>>>> well as links in between the nodes.
> > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
> > >>>> rule
> > >>>> >>>>>>> engine
> > >>>> >>>>>>>>>> API.
> > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> > >>>> incremental
> > >>>> >>>>>>> plan
> > >>>> >>>>>>>>>>>>> involving multiple time points.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and
> > thus
> > >>>> when
> > >>>> >>>>>>>>>> disabled,
> > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied
> this
> > >>>> >>>>>>>>>> Calcite-extended
> > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic
> query
> > >>>> called
> > >>>> >>>>>>> the
> > >>>> >>>>>>>>>>>> ‘‘range
> > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
> > >>>> savings
> > >>>> >>> of
> > >>>> >>>>>>> 80%
> > >>>> >>>>>>>>>> on
> > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on
> end-to-end
> > >>>> >>> execution
> > >>>> >>>>>>>>>> time.
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and
> > happy
> > >>>> >>>>>>> holidays!
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>> Best,
> > >>>> >>>>>>>>>>>>> Botong
> > >>>> >>>>>>>>>>>>>
> > >>>> >>>>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>>>
> > >>>> >>>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>>>>>
> > >>>> >>>>>>> --
> > >>>> >>>>>>> ~~~~~~~~~~~~~~~
> > >>>> >>>>>>> no mistakes
> > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> > >>>> >>>>>>>
> > >>>> >>>>>>
> > >>>> >>>
> > >>>> >>
> > >>>>
> > >>>>
> >
>
>
> --
> Viliam Durina
> Jet Developer
>       hazelcast®
>
>   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
> USA
> +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
>
> --
> This message contains confidential information and is intended only for
> the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Viliam Durina <vi...@hazelcast.com>.
Is there a recording available?
Viliam

On Wed, 28 Apr 2021 at 00:15, Botong Huang <pk...@gmail.com> wrote:

> Hi all,
>
> The meeting yesterday was fun and productive. As discussed, this is the
> call to schedule our second meeting.
>
> We encourage everyone to add their time preferences during 05/01 - 05/15
> here:
>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>
> Thanks,
> Botong
>
> On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com> wrote:
>
> > Hi all,
> > We've created a zoom meeting below for our meeting next Monday
> > (9pm-10:30pm PST on 04/26).
> > Talk to you all soon!
> >
> > Join Zoom Meeting
> > https://uci.zoom.us/j/91279732686
> > <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> >
> >
> > Meeting ID: 912 7973 2686
> > One tap mobile
> > +16699006833,,91279732686# US (San Jose)
> > +12532158782,,91279732686# US (Tacoma)
> >
> > Dial by your location
> > +1 669 900 6833 US (San Jose)
> > +1 253 215 8782 US (Tacoma)
> > +1 346 248 7799 US (Houston)
> > +1 301 715 8592 US (Washington DC)
> > +1 312 626 6799 US (Chicago)
> > +1 646 558 8656 US (New York)
> > Meeting ID: 912 7973 2686
> > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> >
> >
> > Join by Skype for Business
> > https://uci.zoom.us/skype/91279732686
> > <
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> >
> >
> >
> > Thanks,
> > Botong
> >
> > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pk...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> According to the preferences collected, we are tentatively scheduling
> our
> >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> >>
> >> We will give a presentation about Tempura, followed by a free
> discussion.
> >>
> >> Please let us know if there are new other requests. Few days before
> >> the meeting, I will send out a zoom meeting link.
> >>
> >> Thanks,
> >> Botong
> >>
> >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com> wrote:
> >>
> >>> Hi Julian and all,
> >>>
> >>> We've posted the Tempura code base below. Feel free to take a quick
> peek
> >>> at the last five commits.
> >>>
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> >>>
> >>> I've also opened a Jira (CALCITE-4568
> >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will
> serve
> >>> as the umbrella Jira for the feature.
> >>>
> >>> In the meantime, we encourage everyone to enter the time preferences
> for
> >>> our first meeting here:
> >>>
> >>>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>
> >>> Thanks,
> >>> Botong
> >>>
> >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jh...@gmail.com>
> >>> wrote:
> >>>
> >>>> I have added my time preferences to the doc.
> >>>>
> >>>> Before we meet, could you publish a PR for us to review?
> >>>>
> >>>> Initial discussions will need to be about architecture and high-level
> >>>> design. So I would ask Calcite reviewers not to review the PR
> line-by-line
> >>>> (or to leave comments in GitHub) but try to understand the design
> >>>> holistically, and prepare questions/comments before the meeting.
> >>>>
> >>>> Botong, Can you please create a Calcite JIRA case for this task? JIRA
> >>>> how we track long-running tasks such as this.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com> wrote:
> >>>> >
> >>>> > Hi all,
> >>>> >
> >>>> > Apology for the delay. It took us some time to clean up our code
> base
> >>>> and
> >>>> > publicly release it (which will be out soon) for a quick peek.
> >>>> >
> >>>> > We are ready to present our work. Let's schedule a time for a Zoom
> >>>> > meeting and discuss how to integrate Tempura into Calcite.
> >>>> >
> >>>> > Since some of our team members are in China, we prefer the time slot
> >>>> of
> >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
> >>>> shared
> >>>> > doc below.
> >>>> >
> >>>>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>> >
> >>>> > We encourage everyone to add their time preferences (during
> >>>> 04/15-04/30) in
> >>>> > this doc. In a week or so, we will try to settle a time that works
> for
> >>>> > most.
> >>>> >
> >>>> > Thanks,
> >>>> > Botong
> >>>> >
> >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> Hi Julian and Rui,
> >>>> >>
> >>>> >> Sounds good to us. Please give us some time to prepare some slides
> >>>> for the
> >>>> >> meeting.
> >>>> >>
> >>>> >> I've created a doc below for discussion. Please feel free to add
> >>>> more in
> >>>> >> here:
> >>>> >>
> >>>> >>
> >>>>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>>> >>
> >>>> >> Thanks,
> >>>> >> Botong
> >>>> >>
> >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <
> jhyde.apache@gmail.com
> >>>> >
> >>>> >> wrote:
> >>>> >>
> >>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
> >>>> think we
> >>>> >>> should create it to continue discussion after the first meeting.
> >>>> >>>
> >>>> >>> Julian
> >>>> >>>
> >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <
> jhyde.apache@gmail.com>
> >>>> >>> wrote:
> >>>> >>>>
> >>>> >>>> I think good next steps would be a PR and a meeting. The PR will
> >>>> allow
> >>>> >>> us to read the code, but I think we should do the first round of
> >>>> questions
> >>>> >>> at the meeting.  The meeting could perhaps start with a
> >>>> presentation of the
> >>>> >>> paper (do you have some slides you are planning to present at
> VLDB,
> >>>> >>> Botong?) and then move on to questions about the concepts, which
> >>>> >>> alternatives were considered, and how the concepts map onto other
> >>>> current
> >>>> >>> and future concepts in calcite.
> >>>> >>>>
> >>>> >>>> I don’t think we should start “reviewing” the PR line-by-line at
> >>>> this
> >>>> >>> point. We need to understand the high-level concepts and design
> >>>> choices. If
> >>>> >>> we start reviewing the PR we will get lost in the details.
> >>>> >>>>
> >>>> >>>> I know that integrating a major change is hard; I doubt that we
> >>>> will be
> >>>> >>> able to integrate everything, but we can build understanding about
> >>>> where
> >>>> >>> calcite needs to go, and I hope integrate a good amount of code to
> >>>> help us
> >>>> >>> get there.
> >>>> >>>>
> >>>> >>>> As I said before, after the integration I would like people to be
> >>>> able
> >>>> >>> to experiment with it and use it in their production systems.
> That
> >>>> way, it
> >>>> >>> will not be an experiment that withers, but a feature set
> >>>> integrates with
> >>>> >>> other calcite features and gets stronger over time.
> >>>> >>>>
> >>>> >>>> Julian
> >>>> >>>>
> >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org>
> >>>> wrote:
> >>>> >>>>>
> >>>> >>>>> For me to participate in the discussion for the above
> questions,
> >>>> I
> >>>> >>> will
> >>>> >>>>> need to read a lot more to know relevant context and likely ask
> >>>> lots of
> >>>> >>>>> questions :-).  A editable doc is probably good for questions
> and
> >>>> back
> >>>> >>> and
> >>>> >>>>> forward discussion.
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> -Rui
> >>>> >>>>>
> >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <
> amaliujia@apache.org
> >>>> >
> >>>> >>> wrote:
> >>>> >>>>>>
> >>>> >>>>>> I am also happy to help push this work into Calcite (review
> code
> >>>> and
> >>>> >>> doc,
> >>>> >>>>>> etc.).
> >>>> >>>>>>
> >>>> >>>>>> While you can share your code so people can have more idea how
> >>>> it is
> >>>> >>>>>> implemented, I think it would be also nice to have a doc to
> >>>> discuss
> >>>> >>> open
> >>>> >>>>>> questions above. Some points that I copy those to here:
> >>>> >>>>>>
> >>>> >>>>>> 1. Can this solution be compatible with existing solutions in
> >>>> Calcite
> >>>> >>>>>> Streaming, materialized view maintenance, and multi-query
> >>>> optimization
> >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
> >>>> operator),
> >>>> >>>>>> 2. Did you find that you needed two separate cost models - one
> >>>> for
> >>>> >>> “view
> >>>> >>>>>> maintenance” and another for “user queries” - since the
> >>>> objectives of
> >>>> >>> each
> >>>> >>>>>> activity are so different?
> >>>> >>>>>> 3. whether this work will hasten the arrival of multi-objective
> >>>> >>> parametric
> >>>> >>>>>> query optimization [1] in Calcite.
> >>>> >>>>>> 4. probably SQL shell support.
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>> [1]:
> >>>> >>>>>>
> >>>> >>>
> >>>>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>> -Rui
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>>
> >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com>
> >>>> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>> it would be very nice to see a POC of your work.
> >>>> >>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
> >>>> pkuhbt@gmail.com>
> >>>> >>> wrote:
> >>>> >>>>>>>
> >>>> >>>>>>>> Hi Julian,
> >>>> >>>>>>>>
> >>>> >>>>>>>> Just wondering if there are any updates? We are wondering if
> it
> >>>> >>> would
> >>>> >>>>>>> help
> >>>> >>>>>>>> to post our code for a quick preview.
> >>>> >>>>>>>>
> >>>> >>>>>>>> Thanks,
> >>>> >>>>>>>> Botong
> >>>> >>>>>>>>
> >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <
> pkuhbt@gmail.com
> >>>> >
> >>>> >>> wrote:
> >>>> >>>>>>>>
> >>>> >>>>>>>>> Hi Julian,
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that
> >>>> best
> >>>> >>>>>>> benefits
> >>>> >>>>>>>>> the community. Here are some clarifications that hopefully
> >>>> answer
> >>>> >>> your
> >>>> >>>>>>>>> questions.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> In our work (Tempura), users specify the set of time points
> to
> >>>> >>>>>>> consider
> >>>> >>>>>>>>> running and a cost function that expresses users' preference
> >>>> over
> >>>> >>>>>>> time,
> >>>> >>>>>>>>> Tempura will generate the best incremental plan that
> >>>> minimizes the
> >>>> >>>>>>>> overall
> >>>> >>>>>>>>> cost function.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
> >>>> points
> >>>> >>> can
> >>>> >>>>>>> be
> >>>> >>>>>>>>> different from each other, as opposed to identical plans in
> >>>> all
> >>>> >>> delta
> >>>> >>>>>>>> runs
> >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
> >>>> paper,
> >>>> >>> we
> >>>> >>>>>>> can
> >>>> >>>>>>>>> mimic the current streaming implementation by specifying two
> >>>> >>> (logical)
> >>>> >>>>>>>> time
> >>>> >>>>>>>>> points in Tempura, representing the initial run and later
> >>>> delta
> >>>> >>> runs
> >>>> >>>>>>>>> respectively. In general, note that Tempura supports various
> >>>> form
> >>>> >>> of
> >>>> >>>>>>>>> incremental computing, not only the small-delta append-only
> >>>> data
> >>>> >>>>>>> model in
> >>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes
> the
> >>>> >>> current
> >>>> >>>>>>>>> streaming support, as well as any IVM implementations.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> About the cost model, we did not come up with a seperate
> cost
> >>>> >>> model,
> >>>> >>>>>>> but
> >>>> >>>>>>>>> rather extended the existing one. Similar to multi-objective
> >>>> >>>>>>>> optimization,
> >>>> >>>>>>>>> costs incurred at different time points are considered
> >>>> different
> >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
> >>>> converts this
> >>>> >>>>>>> cost
> >>>> >>>>>>>>> vector into a final cost. So under this function, any two
> >>>> >>> incremental
> >>>> >>>>>>>> plans
> >>>> >>>>>>>>> are still comparable and there is an overall optimum. I
> guess
> >>>> we
> >>>> >>> can
> >>>> >>>>>>> go
> >>>> >>>>>>>>> down the route of multi-objective parametric query
> >>>> optimization
> >>>> >>>>>>> instead
> >>>> >>>>>>>> if
> >>>> >>>>>>>>> there is a need.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Next on materialized views and multi-query optimization,
> >>>> since our
> >>>> >>>>>>>>> multi-time-point plan naturally involves materializing
> >>>> intermediate
> >>>> >>>>>>>> results
> >>>> >>>>>>>>> for later time points, we need to solve the problem of
> >>>> choosing
> >>>> >>>>>>>>> materializations and include the cost of saving and reusing
> >>>> the
> >>>> >>>>>>>>> materializations when costing and comparing plans. We
> >>>> borrowed the
> >>>> >>>>>>>>> multi-query optimization techniques to solve this problem
> even
> >>>> >>> though
> >>>> >>>>>>> we
> >>>> >>>>>>>>> are looking at a single query. As a result, we think our
> work
> >>>> is
> >>>> >>>>>>>> orthogonal
> >>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
> >>>> lattice
> >>>> >>> etc.
> >>>> >>>>>>> We
> >>>> >>>>>>>> do
> >>>> >>>>>>>>> feel that the multi-query optimization component can be
> >>>> adopted to
> >>>> >>>>>>> wider
> >>>> >>>>>>>>> use, but probably need more suggestions from the community.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Lastly, our current implementation is set up in java code,
> it
> >>>> >>> should
> >>>> >>>>>>> be
> >>>> >>>>>>>>> straightforward to hook it up with SQL shell.
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> Thanks,
> >>>> >>>>>>>>> Botong
> >>>> >>>>>>>>>
> >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> >>>> >>> jhyde.apache@gmail.com>
> >>>> >>>>>>>>> wrote:
> >>>> >>>>>>>>>
> >>>> >>>>>>>>>> Botong,
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> This is very exciting; congratulations on this research,
> and
> >>>> thank
> >>>> >>>>>>> you
> >>>> >>>>>>>>>> for contributing it back to Calcite.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
> >>>> >>>>>>> materialized
> >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
> >>>> already
> >>>> >>>>>>> some
> >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
> >>>> operators,
> >>>> >>>>>>> lattice,
> >>>> >>>>>>>>>> and Spool operator), it will be interesting to see whether
> >>>> we can
> >>>> >>>>>>> make
> >>>> >>>>>>>> them
> >>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> Your work differs from streaming queries in that your
> >>>> relations
> >>>> >>> are
> >>>> >>>>>>> used
> >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
> >>>> queries, the
> >>>> >>>>>>> only
> >>>> >>>>>>>>>> activity is the change propagation. Did you find that you
> >>>> needed
> >>>> >>> two
> >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
> >>>> another for
> >>>> >>>>>>> “user
> >>>> >>>>>>>>>> queries” - since the objectives of each activity are so
> >>>> different?
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> >>>> >>> multi-objective
> >>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> I will make time over the next few days to read and digest
> >>>> your
> >>>> >>>>>>> paper.
> >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
> >>>> create
> >>>> >>>>>>>>>> something that will be useful for the broader community.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> One thing will be particularly useful: making this
> >>>> functionality
> >>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
> >>>> with
> >>>> >>> this
> >>>> >>>>>>>>>> functionality without writing Java code or setting up
> complex
> >>>> >>>>>>> databases
> >>>> >>>>>>>> and
> >>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
> >>>> operations
> >>>> >>>>>>> that
> >>>> >>>>>>>> are
> >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
> >>>> could
> >>>> >>>>>>> devise
> >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> Julian
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>> [1]
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>
> >>>> >>>
> >>>>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <
> pkuhbt@gmail.com
> >>>> >
> >>>> >>>>>>> wrote:
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure,
> please
> >>>> >>> refer
> >>>> >>>>>>> to
> >>>> >>>>>>>>>> Fig
> >>>> >>>>>>>>>>> 3(a) in our paper:
> >>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> Best,
> >>>> >>>>>>>>>>> Botong
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> >>>> taojiatao@gmail.com>
> >>>> >>>>>>>> wrote:
> >>>> >>>>>>>>>>>
> >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
> >>>> may you
> >>>> >>>>>>> open
> >>>> >>>>>>>> a
> >>>> >>>>>>>>>> JIRA
> >>>> >>>>>>>>>>>> for this, people who are interested in this can subscribe
> >>>> to the
> >>>> >>>>>>>> JIRA?
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Regards!
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Aron Tao
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四
> 上午3:18写道:
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Hi all,
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into
> a
> >>>> >>> general
> >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
> >>>> >>>>>>> published
> >>>> >>>>>>>> in
> >>>> >>>>>>>>>>>> VLDB
> >>>> >>>>>>>>>>>>> 2021:
> >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
> >>>> >>> incremental
> >>>> >>>>>>>> data
> >>>> >>>>>>>>>>>>> processing
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
> >>>> Alibaba’s
> >>>> >>>>>>> data
> >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
> >>>> optimizer
> >>>> >>> to
> >>>> >>>>>>>>>>>> alleviate
> >>>> >>>>>>>>>>>>> cluster-wise resource skewness:
> >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
> >>>> >>> Incremental
> >>>> >>>>>>>>>>>> Computing
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
> >>>> cost-based
> >>>> >>>>>>>>>> incremental
> >>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
> >>>> families
> >>>> >>> of
> >>>> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
> >>>> >>>>>>> DBToaster,
> >>>> >>>>>>>>>> etc.
> >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best
> >>>> plan
> >>>> >>> is
> >>>> >>>>>>>>>>>>> consistently much better than the plans from each
> >>>> individual
> >>>> >>>>>>> method
> >>>> >>>>>>>>>>>> alone.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> In general, incremental query planning is central to
> >>>> database
> >>>> >>>>>>> view
> >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
> >>>> >>> adopted
> >>>> >>>>>>> in
> >>>> >>>>>>>>>>>> active
> >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate query
> >>>> >>>>>>> processing,
> >>>> >>>>>>>>>> etc.
> >>>> >>>>>>>>>>>> We
> >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
> >>>> spectrum of
> >>>> >>>>>>>>>> Calcite,
> >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
> >>>> Please
> >>>> >>>>>>> refer
> >>>> >>>>>>>> to
> >>>> >>>>>>>>>>>> the
> >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
> >>>> >>> journal
> >>>> >>>>>>>>>> version
> >>>> >>>>>>>>>>>> of
> >>>> >>>>>>>>>>>>> the paper with more implementation details.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant
> to
> >>>> be
> >>>> >>>>>>>> executed
> >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will
> >>>> be
> >>>> >>>>>>> extended
> >>>> >>>>>>>>>> with
> >>>> >>>>>>>>>>>>> temporal information so that it is capable of generating
> >>>> >>>>>>> incremental
> >>>> >>>>>>>>>>>> plans
> >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at different
> >>>> time
> >>>> >>>>>>> points.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> The main idea is to view each table as one that changes
> >>>> over
> >>>> >>> time
> >>>> >>>>>>>>>> (Time
> >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
> >>>> >>>>>>> TvrMetaSet
> >>>> >>>>>>>>>> into
> >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
> >>>> related
> >>>> >>>>>>> RelSets
> >>>> >>>>>>>>>> of a
> >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
> >>>> time,
> >>>> >>>>>>> delta of
> >>>> >>>>>>>>>> the
> >>>> >>>>>>>>>>>>> table between two time points, etc.).
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> [image: image.png]
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
> >>>> >>>>>>> TvrMetaSet
> >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
> >>>> >>> Horizontal
> >>>> >>>>>>>> lines
> >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet.
> >>>> Users
> >>>> >>> can
> >>>> >>>>>>>>>> write
> >>>> >>>>>>>>>>>> TVR
> >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between
> >>>> these
> >>>> >>>>>>> dots.
> >>>> >>>>>>>>>> For
> >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
> >>>> describe how
> >>>> >>> to
> >>>> >>>>>>>>>> compute
> >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The
> >>>> red
> >>>> >>> lines
> >>>> >>>>>>>> are
> >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
> >>>> TVR. All
> >>>> >>>>>>> TVR
> >>>> >>>>>>>>>>>> rewrite
> >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules
> still
> >>>> work
> >>>> >>> in
> >>>> >>>>>>>> the
> >>>> >>>>>>>>>> new
> >>>> >>>>>>>>>>>>> volcano system without modification.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
> >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet
> and
> >>>> >>>>>>> RelNodes,
> >>>> >>>>>>>>>> as
> >>>> >>>>>>>>>>>>> well as links in between the nodes.
> >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
> >>>> rule
> >>>> >>>>>>> engine
> >>>> >>>>>>>>>> API.
> >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> >>>> incremental
> >>>> >>>>>>> plan
> >>>> >>>>>>>>>>>>> involving multiple time points.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and
> thus
> >>>> when
> >>>> >>>>>>>>>> disabled,
> >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
> >>>> >>>>>>>>>> Calcite-extended
> >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
> >>>> called
> >>>> >>>>>>> the
> >>>> >>>>>>>>>>>> ‘‘range
> >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
> >>>> savings
> >>>> >>> of
> >>>> >>>>>>> 80%
> >>>> >>>>>>>>>> on
> >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
> >>>> >>> execution
> >>>> >>>>>>>>>> time.
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and
> happy
> >>>> >>>>>>> holidays!
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>> Best,
> >>>> >>>>>>>>>>>>> Botong
> >>>> >>>>>>>>>>>>>
> >>>> >>>>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>>>
> >>>> >>>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>>
> >>>> >>>>>>> --
> >>>> >>>>>>> ~~~~~~~~~~~~~~~
> >>>> >>>>>>> no mistakes
> >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
> >>>> >>>>>>>
> >>>> >>>>>>
> >>>> >>>
> >>>> >>
> >>>>
> >>>>
>


-- 
Viliam Durina
Jet Developer
      hazelcast®

  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
USA
+1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi all,

The meeting yesterday was fun and productive. As discussed, this is the
call to schedule our second meeting.

We encourage everyone to add their time preferences during 05/01 - 05/15
here:
https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing

Thanks,
Botong

On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pk...@gmail.com> wrote:

> Hi all,
> We've created a zoom meeting below for our meeting next Monday
> (9pm-10:30pm PST on 04/26).
> Talk to you all soon!
>
> Join Zoom Meeting
> https://uci.zoom.us/j/91279732686
> <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE>
>
> Meeting ID: 912 7973 2686
> One tap mobile
> +16699006833,,91279732686# US (San Jose)
> +12532158782,,91279732686# US (Tacoma)
>
> Dial by your location
> +1 669 900 6833 US (San Jose)
> +1 253 215 8782 US (Tacoma)
> +1 346 248 7799 US (Houston)
> +1 301 715 8592 US (Washington DC)
> +1 312 626 6799 US (Chicago)
> +1 646 558 8656 US (New York)
> Meeting ID: 912 7973 2686
> Find your local number: https://uci.zoom.us/u/aykHTkJBh
> <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM>
>
> Join by Skype for Business
> https://uci.zoom.us/skype/91279732686
> <https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy>
>
>
> Thanks,
> Botong
>
> On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pk...@gmail.com> wrote:
>
>> Hi all,
>>
>> According to the preferences collected, we are tentatively scheduling our
>> meeting at 9pm-10:30pm PST on 04/26 Monday.
>>
>> We will give a presentation about Tempura, followed by a free discussion.
>>
>> Please let us know if there are new other requests. Few days before
>> the meeting, I will send out a zoom meeting link.
>>
>> Thanks,
>> Botong
>>
>> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com> wrote:
>>
>>> Hi Julian and all,
>>>
>>> We've posted the Tempura code base below. Feel free to take a quick peek
>>> at the last five commits.
>>> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>>>
>>> I've also opened a Jira (CALCITE-4568
>>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will serve
>>> as the umbrella Jira for the feature.
>>>
>>> In the meantime, we encourage everyone to enter the time preferences for
>>> our first meeting here:
>>>
>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>
>>> Thanks,
>>> Botong
>>>
>>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jh...@gmail.com>
>>> wrote:
>>>
>>>> I have added my time preferences to the doc.
>>>>
>>>> Before we meet, could you publish a PR for us to review?
>>>>
>>>> Initial discussions will need to be about architecture and high-level
>>>> design. So I would ask Calcite reviewers not to review the PR line-by-line
>>>> (or to leave comments in GitHub) but try to understand the design
>>>> holistically, and prepare questions/comments before the meeting.
>>>>
>>>> Botong, Can you please create a Calcite JIRA case for this task? JIRA
>>>> how we track long-running tasks such as this.
>>>>
>>>> Julian
>>>>
>>>>
>>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Apology for the delay. It took us some time to clean up our code base
>>>> and
>>>> > publicly release it (which will be out soon) for a quick peek.
>>>> >
>>>> > We are ready to present our work. Let's schedule a time for a Zoom
>>>> > meeting and discuss how to integrate Tempura into Calcite.
>>>> >
>>>> > Since some of our team members are in China, we prefer the time slot
>>>> of
>>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
>>>> shared
>>>> > doc below.
>>>> >
>>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>> >
>>>> > We encourage everyone to add their time preferences (during
>>>> 04/15-04/30) in
>>>> > this doc. In a week or so, we will try to settle a time that works for
>>>> > most.
>>>> >
>>>> > Thanks,
>>>> > Botong
>>>> >
>>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com>
>>>> wrote:
>>>> >
>>>> >> Hi Julian and Rui,
>>>> >>
>>>> >> Sounds good to us. Please give us some time to prepare some slides
>>>> for the
>>>> >> meeting.
>>>> >>
>>>> >> I've created a doc below for discussion. Please feel free to add
>>>> more in
>>>> >> here:
>>>> >>
>>>> >>
>>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>>> >>
>>>> >> Thanks,
>>>> >> Botong
>>>> >>
>>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <jhyde.apache@gmail.com
>>>> >
>>>> >> wrote:
>>>> >>
>>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
>>>> think we
>>>> >>> should create it to continue discussion after the first meeting.
>>>> >>>
>>>> >>> Julian
>>>> >>>
>>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <jh...@gmail.com>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> I think good next steps would be a PR and a meeting. The PR will
>>>> allow
>>>> >>> us to read the code, but I think we should do the first round of
>>>> questions
>>>> >>> at the meeting.  The meeting could perhaps start with a
>>>> presentation of the
>>>> >>> paper (do you have some slides you are planning to present at VLDB,
>>>> >>> Botong?) and then move on to questions about the concepts, which
>>>> >>> alternatives were considered, and how the concepts map onto other
>>>> current
>>>> >>> and future concepts in calcite.
>>>> >>>>
>>>> >>>> I don’t think we should start “reviewing” the PR line-by-line at
>>>> this
>>>> >>> point. We need to understand the high-level concepts and design
>>>> choices. If
>>>> >>> we start reviewing the PR we will get lost in the details.
>>>> >>>>
>>>> >>>> I know that integrating a major change is hard; I doubt that we
>>>> will be
>>>> >>> able to integrate everything, but we can build understanding about
>>>> where
>>>> >>> calcite needs to go, and I hope integrate a good amount of code to
>>>> help us
>>>> >>> get there.
>>>> >>>>
>>>> >>>> As I said before, after the integration I would like people to be
>>>> able
>>>> >>> to experiment with it and use it in their production systems.  That
>>>> way, it
>>>> >>> will not be an experiment that withers, but a feature set
>>>> integrates with
>>>> >>> other calcite features and gets stronger over time.
>>>> >>>>
>>>> >>>> Julian
>>>> >>>>
>>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> For me to participate in the discussion for the above questions,
>>>> I
>>>> >>> will
>>>> >>>>> need to read a lot more to know relevant context and likely ask
>>>> lots of
>>>> >>>>> questions :-).  A editable doc is probably good for questions and
>>>> back
>>>> >>> and
>>>> >>>>> forward discussion.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> -Rui
>>>> >>>>>
>>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <amaliujia@apache.org
>>>> >
>>>> >>> wrote:
>>>> >>>>>>
>>>> >>>>>> I am also happy to help push this work into Calcite (review code
>>>> and
>>>> >>> doc,
>>>> >>>>>> etc.).
>>>> >>>>>>
>>>> >>>>>> While you can share your code so people can have more idea how
>>>> it is
>>>> >>>>>> implemented, I think it would be also nice to have a doc to
>>>> discuss
>>>> >>> open
>>>> >>>>>> questions above. Some points that I copy those to here:
>>>> >>>>>>
>>>> >>>>>> 1. Can this solution be compatible with existing solutions in
>>>> Calcite
>>>> >>>>>> Streaming, materialized view maintenance, and multi-query
>>>> optimization
>>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
>>>> operator),
>>>> >>>>>> 2. Did you find that you needed two separate cost models - one
>>>> for
>>>> >>> “view
>>>> >>>>>> maintenance” and another for “user queries” - since the
>>>> objectives of
>>>> >>> each
>>>> >>>>>> activity are so different?
>>>> >>>>>> 3. whether this work will hasten the arrival of multi-objective
>>>> >>> parametric
>>>> >>>>>> query optimization [1] in Calcite.
>>>> >>>>>> 4. probably SQL shell support.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> [1]:
>>>> >>>>>>
>>>> >>>
>>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> -Rui
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com>
>>>> wrote:
>>>> >>>>>>>
>>>> >>>>>>> it would be very nice to see a POC of your work.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <
>>>> pkuhbt@gmail.com>
>>>> >>> wrote:
>>>> >>>>>>>
>>>> >>>>>>>> Hi Julian,
>>>> >>>>>>>>
>>>> >>>>>>>> Just wondering if there are any updates? We are wondering if it
>>>> >>> would
>>>> >>>>>>> help
>>>> >>>>>>>> to post our code for a quick preview.
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks,
>>>> >>>>>>>> Botong
>>>> >>>>>>>>
>>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <pkuhbt@gmail.com
>>>> >
>>>> >>> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>>> Hi Julian,
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that
>>>> best
>>>> >>>>>>> benefits
>>>> >>>>>>>>> the community. Here are some clarifications that hopefully
>>>> answer
>>>> >>> your
>>>> >>>>>>>>> questions.
>>>> >>>>>>>>>
>>>> >>>>>>>>> In our work (Tempura), users specify the set of time points to
>>>> >>>>>>> consider
>>>> >>>>>>>>> running and a cost function that expresses users' preference
>>>> over
>>>> >>>>>>> time,
>>>> >>>>>>>>> Tempura will generate the best incremental plan that
>>>> minimizes the
>>>> >>>>>>>> overall
>>>> >>>>>>>>> cost function.
>>>> >>>>>>>>>
>>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
>>>> points
>>>> >>> can
>>>> >>>>>>> be
>>>> >>>>>>>>> different from each other, as opposed to identical plans in
>>>> all
>>>> >>> delta
>>>> >>>>>>>> runs
>>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
>>>> paper,
>>>> >>> we
>>>> >>>>>>> can
>>>> >>>>>>>>> mimic the current streaming implementation by specifying two
>>>> >>> (logical)
>>>> >>>>>>>> time
>>>> >>>>>>>>> points in Tempura, representing the initial run and later
>>>> delta
>>>> >>> runs
>>>> >>>>>>>>> respectively. In general, note that Tempura supports various
>>>> form
>>>> >>> of
>>>> >>>>>>>>> incremental computing, not only the small-delta append-only
>>>> data
>>>> >>>>>>> model in
>>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes the
>>>> >>> current
>>>> >>>>>>>>> streaming support, as well as any IVM implementations.
>>>> >>>>>>>>>
>>>> >>>>>>>>> About the cost model, we did not come up with a seperate cost
>>>> >>> model,
>>>> >>>>>>> but
>>>> >>>>>>>>> rather extended the existing one. Similar to multi-objective
>>>> >>>>>>>> optimization,
>>>> >>>>>>>>> costs incurred at different time points are considered
>>>> different
>>>> >>>>>>>>> dimensions. Tempura lets users supply a function that
>>>> converts this
>>>> >>>>>>> cost
>>>> >>>>>>>>> vector into a final cost. So under this function, any two
>>>> >>> incremental
>>>> >>>>>>>> plans
>>>> >>>>>>>>> are still comparable and there is an overall optimum. I guess
>>>> we
>>>> >>> can
>>>> >>>>>>> go
>>>> >>>>>>>>> down the route of multi-objective parametric query
>>>> optimization
>>>> >>>>>>> instead
>>>> >>>>>>>> if
>>>> >>>>>>>>> there is a need.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Next on materialized views and multi-query optimization,
>>>> since our
>>>> >>>>>>>>> multi-time-point plan naturally involves materializing
>>>> intermediate
>>>> >>>>>>>> results
>>>> >>>>>>>>> for later time points, we need to solve the problem of
>>>> choosing
>>>> >>>>>>>>> materializations and include the cost of saving and reusing
>>>> the
>>>> >>>>>>>>> materializations when costing and comparing plans. We
>>>> borrowed the
>>>> >>>>>>>>> multi-query optimization techniques to solve this problem even
>>>> >>> though
>>>> >>>>>>> we
>>>> >>>>>>>>> are looking at a single query. As a result, we think our work
>>>> is
>>>> >>>>>>>> orthogonal
>>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
>>>> lattice
>>>> >>> etc.
>>>> >>>>>>> We
>>>> >>>>>>>> do
>>>> >>>>>>>>> feel that the multi-query optimization component can be
>>>> adopted to
>>>> >>>>>>> wider
>>>> >>>>>>>>> use, but probably need more suggestions from the community.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Lastly, our current implementation is set up in java code, it
>>>> >>> should
>>>> >>>>>>> be
>>>> >>>>>>>>> straightforward to hook it up with SQL shell.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Thanks,
>>>> >>>>>>>>> Botong
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>>>> >>> jhyde.apache@gmail.com>
>>>> >>>>>>>>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>> Botong,
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> This is very exciting; congratulations on this research, and
>>>> thank
>>>> >>>>>>> you
>>>> >>>>>>>>>> for contributing it back to Calcite.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
>>>> >>>>>>> materialized
>>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
>>>> already
>>>> >>>>>>> some
>>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
>>>> operators,
>>>> >>>>>>> lattice,
>>>> >>>>>>>>>> and Spool operator), it will be interesting to see whether
>>>> we can
>>>> >>>>>>> make
>>>> >>>>>>>> them
>>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Your work differs from streaming queries in that your
>>>> relations
>>>> >>> are
>>>> >>>>>>> used
>>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
>>>> queries, the
>>>> >>>>>>> only
>>>> >>>>>>>>>> activity is the change propagation. Did you find that you
>>>> needed
>>>> >>> two
>>>> >>>>>>>>>> separate cost models - one for “view maintenance” and
>>>> another for
>>>> >>>>>>> “user
>>>> >>>>>>>>>> queries” - since the objectives of each activity are so
>>>> different?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
>>>> >>> multi-objective
>>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I will make time over the next few days to read and digest
>>>> your
>>>> >>>>>>> paper.
>>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
>>>> create
>>>> >>>>>>>>>> something that will be useful for the broader community.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> One thing will be particularly useful: making this
>>>> functionality
>>>> >>>>>>>>>> available from a SQL shell, so that people can experiment
>>>> with
>>>> >>> this
>>>> >>>>>>>>>> functionality without writing Java code or setting up complex
>>>> >>>>>>> databases
>>>> >>>>>>>> and
>>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
>>>> operations
>>>> >>>>>>> that
>>>> >>>>>>>> are
>>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
>>>> could
>>>> >>>>>>> devise
>>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Julian
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> [1]
>>>> >>>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>
>>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <pkuhbt@gmail.com
>>>> >
>>>> >>>>>>> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, please
>>>> >>> refer
>>>> >>>>>>> to
>>>> >>>>>>>>>> Fig
>>>> >>>>>>>>>>> 3(a) in our paper:
>>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Best,
>>>> >>>>>>>>>>> Botong
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>>>> taojiatao@gmail.com>
>>>> >>>>>>>> wrote:
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail,
>>>> may you
>>>> >>>>>>> open
>>>> >>>>>>>> a
>>>> >>>>>>>>>> JIRA
>>>> >>>>>>>>>>>> for this, people who are interested in this can subscribe
>>>> to the
>>>> >>>>>>>> JIRA?
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Regards!
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Aron Tao
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四 上午3:18写道:
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Hi all,
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
>>>> >>> general
>>>> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
>>>> >>>>>>> published
>>>> >>>>>>>> in
>>>> >>>>>>>>>>>> VLDB
>>>> >>>>>>>>>>>>> 2021:
>>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
>>>> >>> incremental
>>>> >>>>>>>> data
>>>> >>>>>>>>>>>>> processing
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
>>>> Alibaba’s
>>>> >>>>>>> data
>>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
>>>> optimizer
>>>> >>> to
>>>> >>>>>>>>>>>> alleviate
>>>> >>>>>>>>>>>>> cluster-wise resource skewness:
>>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
>>>> >>> Incremental
>>>> >>>>>>>>>>>> Computing
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> To our best knowledge, this is the first general
>>>> cost-based
>>>> >>>>>>>>>> incremental
>>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
>>>> families
>>>> >>> of
>>>> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
>>>> >>>>>>> DBToaster,
>>>> >>>>>>>>>> etc.
>>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best
>>>> plan
>>>> >>> is
>>>> >>>>>>>>>>>>> consistently much better than the plans from each
>>>> individual
>>>> >>>>>>> method
>>>> >>>>>>>>>>>> alone.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> In general, incremental query planning is central to
>>>> database
>>>> >>>>>>> view
>>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
>>>> >>> adopted
>>>> >>>>>>> in
>>>> >>>>>>>>>>>> active
>>>> >>>>>>>>>>>>> databases, resumable query execution, approximate query
>>>> >>>>>>> processing,
>>>> >>>>>>>>>> etc.
>>>> >>>>>>>>>>>> We
>>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
>>>> spectrum of
>>>> >>>>>>>>>> Calcite,
>>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
>>>> Please
>>>> >>>>>>> refer
>>>> >>>>>>>> to
>>>> >>>>>>>>>>>> the
>>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
>>>> >>> journal
>>>> >>>>>>>>>> version
>>>> >>>>>>>>>>>> of
>>>> >>>>>>>>>>>>> the paper with more implementation details.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant to
>>>> be
>>>> >>>>>>>> executed
>>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will
>>>> be
>>>> >>>>>>> extended
>>>> >>>>>>>>>> with
>>>> >>>>>>>>>>>>> temporal information so that it is capable of generating
>>>> >>>>>>> incremental
>>>> >>>>>>>>>>>> plans
>>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at different
>>>> time
>>>> >>>>>>> points.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> The main idea is to view each table as one that changes
>>>> over
>>>> >>> time
>>>> >>>>>>>>>> (Time
>>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
>>>> >>>>>>> TvrMetaSet
>>>> >>>>>>>>>> into
>>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
>>>> related
>>>> >>>>>>> RelSets
>>>> >>>>>>>>>> of a
>>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain
>>>> time,
>>>> >>>>>>> delta of
>>>> >>>>>>>>>> the
>>>> >>>>>>>>>>>>> table between two time points, etc.).
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> [image: image.png]
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
>>>> >>>>>>> TvrMetaSet
>>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
>>>> >>> Horizontal
>>>> >>>>>>>> lines
>>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet.
>>>> Users
>>>> >>> can
>>>> >>>>>>>>>> write
>>>> >>>>>>>>>>>> TVR
>>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between
>>>> these
>>>> >>>>>>> dots.
>>>> >>>>>>>>>> For
>>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that
>>>> describe how
>>>> >>> to
>>>> >>>>>>>>>> compute
>>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The
>>>> red
>>>> >>> lines
>>>> >>>>>>>> are
>>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
>>>> TVR. All
>>>> >>>>>>> TVR
>>>> >>>>>>>>>>>> rewrite
>>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules still
>>>> work
>>>> >>> in
>>>> >>>>>>>> the
>>>> >>>>>>>>>> new
>>>> >>>>>>>>>>>>> volcano system without modification.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
>>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
>>>> >>>>>>> RelNodes,
>>>> >>>>>>>>>> as
>>>> >>>>>>>>>>>>> well as links in between the nodes.
>>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded
>>>> rule
>>>> >>>>>>> engine
>>>> >>>>>>>>>> API.
>>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
>>>> incremental
>>>> >>>>>>> plan
>>>> >>>>>>>>>>>>> involving multiple time points.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and thus
>>>> when
>>>> >>>>>>>>>> disabled,
>>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
>>>> >>>>>>>>>> Calcite-extended
>>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
>>>> called
>>>> >>>>>>> the
>>>> >>>>>>>>>>>> ‘‘range
>>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
>>>> savings
>>>> >>> of
>>>> >>>>>>> 80%
>>>> >>>>>>>>>> on
>>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
>>>> >>> execution
>>>> >>>>>>>>>> time.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
>>>> >>>>>>> holidays!
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>> Botong
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> --
>>>> >>>>>>> ~~~~~~~~~~~~~~~
>>>> >>>>>>> no mistakes
>>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>
>>>> >>
>>>>
>>>>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi all,
We've created a zoom meeting below for our meeting next Monday (9pm-10:30pm
PST on 04/26).
Talk to you all soon!

Join Zoom Meeting
https://uci.zoom.us/j/91279732686
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE>

Meeting ID: 912 7973 2686
One tap mobile
+16699006833,,91279732686# US (San Jose)
+12532158782,,91279732686# US (Tacoma)

Dial by your location
+1 669 900 6833 US (San Jose)
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 646 558 8656 US (New York)
Meeting ID: 912 7973 2686
Find your local number: https://uci.zoom.us/u/aykHTkJBh
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM>

Join by Skype for Business
https://uci.zoom.us/skype/91279732686
<https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy>


Thanks,
Botong

On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pk...@gmail.com> wrote:

> Hi all,
>
> According to the preferences collected, we are tentatively scheduling our
> meeting at 9pm-10:30pm PST on 04/26 Monday.
>
> We will give a presentation about Tempura, followed by a free discussion.
>
> Please let us know if there are new other requests. Few days before
> the meeting, I will send out a zoom meeting link.
>
> Thanks,
> Botong
>
> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com> wrote:
>
>> Hi Julian and all,
>>
>> We've posted the Tempura code base below. Feel free to take a quick peek
>> at the last five commits.
>> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>>
>> I've also opened a Jira (CALCITE-4568
>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will serve
>> as the umbrella Jira for the feature.
>>
>> In the meantime, we encourage everyone to enter the time preferences for
>> our first meeting here:
>>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>
>> Thanks,
>> Botong
>>
>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jh...@gmail.com>
>> wrote:
>>
>>> I have added my time preferences to the doc.
>>>
>>> Before we meet, could you publish a PR for us to review?
>>>
>>> Initial discussions will need to be about architecture and high-level
>>> design. So I would ask Calcite reviewers not to review the PR line-by-line
>>> (or to leave comments in GitHub) but try to understand the design
>>> holistically, and prepare questions/comments before the meeting.
>>>
>>> Botong, Can you please create a Calcite JIRA case for this task? JIRA
>>> how we track long-running tasks such as this.
>>>
>>> Julian
>>>
>>>
>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Apology for the delay. It took us some time to clean up our code base
>>> and
>>> > publicly release it (which will be out soon) for a quick peek.
>>> >
>>> > We are ready to present our work. Let's schedule a time for a Zoom
>>> > meeting and discuss how to integrate Tempura into Calcite.
>>> >
>>> > Since some of our team members are in China, we prefer the time slot of
>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the
>>> shared
>>> > doc below.
>>> >
>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>> >
>>> > We encourage everyone to add their time preferences (during
>>> 04/15-04/30) in
>>> > this doc. In a week or so, we will try to settle a time that works for
>>> > most.
>>> >
>>> > Thanks,
>>> > Botong
>>> >
>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com> wrote:
>>> >
>>> >> Hi Julian and Rui,
>>> >>
>>> >> Sounds good to us. Please give us some time to prepare some slides
>>> for the
>>> >> meeting.
>>> >>
>>> >> I've created a doc below for discussion. Please feel free to add more
>>> in
>>> >> here:
>>> >>
>>> >>
>>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>>> >>
>>> >> Thanks,
>>> >> Botong
>>> >>
>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <jh...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I
>>> think we
>>> >>> should create it to continue discussion after the first meeting.
>>> >>>
>>> >>> Julian
>>> >>>
>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <jh...@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> I think good next steps would be a PR and a meeting. The PR will
>>> allow
>>> >>> us to read the code, but I think we should do the first round of
>>> questions
>>> >>> at the meeting.  The meeting could perhaps start with a presentation
>>> of the
>>> >>> paper (do you have some slides you are planning to present at VLDB,
>>> >>> Botong?) and then move on to questions about the concepts, which
>>> >>> alternatives were considered, and how the concepts map onto other
>>> current
>>> >>> and future concepts in calcite.
>>> >>>>
>>> >>>> I don’t think we should start “reviewing” the PR line-by-line at
>>> this
>>> >>> point. We need to understand the high-level concepts and design
>>> choices. If
>>> >>> we start reviewing the PR we will get lost in the details.
>>> >>>>
>>> >>>> I know that integrating a major change is hard; I doubt that we
>>> will be
>>> >>> able to integrate everything, but we can build understanding about
>>> where
>>> >>> calcite needs to go, and I hope integrate a good amount of code to
>>> help us
>>> >>> get there.
>>> >>>>
>>> >>>> As I said before, after the integration I would like people to be
>>> able
>>> >>> to experiment with it and use it in their production systems.  That
>>> way, it
>>> >>> will not be an experiment that withers, but a feature set integrates
>>> with
>>> >>> other calcite features and gets stronger over time.
>>> >>>>
>>> >>>> Julian
>>> >>>>
>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org>
>>> wrote:
>>> >>>>>
>>> >>>>> For me to participate in the discussion for the above questions, I
>>> >>> will
>>> >>>>> need to read a lot more to know relevant context and likely ask
>>> lots of
>>> >>>>> questions :-).  A editable doc is probably good for questions and
>>> back
>>> >>> and
>>> >>>>> forward discussion.
>>> >>>>>
>>> >>>>>
>>> >>>>> -Rui
>>> >>>>>
>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <am...@apache.org>
>>> >>> wrote:
>>> >>>>>>
>>> >>>>>> I am also happy to help push this work into Calcite (review code
>>> and
>>> >>> doc,
>>> >>>>>> etc.).
>>> >>>>>>
>>> >>>>>> While you can share your code so people can have more idea how it
>>> is
>>> >>>>>> implemented, I think it would be also nice to have a doc to
>>> discuss
>>> >>> open
>>> >>>>>> questions above. Some points that I copy those to here:
>>> >>>>>>
>>> >>>>>> 1. Can this solution be compatible with existing solutions in
>>> Calcite
>>> >>>>>> Streaming, materialized view maintenance, and multi-query
>>> optimization
>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
>>> operator),
>>> >>>>>> 2. Did you find that you needed two separate cost models - one for
>>> >>> “view
>>> >>>>>> maintenance” and another for “user queries” - since the
>>> objectives of
>>> >>> each
>>> >>>>>> activity are so different?
>>> >>>>>> 3. whether this work will hasten the arrival of multi-objective
>>> >>> parametric
>>> >>>>>> query optimization [1] in Calcite.
>>> >>>>>> 4. probably SQL shell support.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> [1]:
>>> >>>>>>
>>> >>>
>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> -Rui
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> it would be very nice to see a POC of your work.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <pkuhbt@gmail.com
>>> >
>>> >>> wrote:
>>> >>>>>>>
>>> >>>>>>>> Hi Julian,
>>> >>>>>>>>
>>> >>>>>>>> Just wondering if there are any updates? We are wondering if it
>>> >>> would
>>> >>>>>>> help
>>> >>>>>>>> to post our code for a quick preview.
>>> >>>>>>>>
>>> >>>>>>>> Thanks,
>>> >>>>>>>> Botong
>>> >>>>>>>>
>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <pk...@gmail.com>
>>> >>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> Hi Julian,
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that
>>> best
>>> >>>>>>> benefits
>>> >>>>>>>>> the community. Here are some clarifications that hopefully
>>> answer
>>> >>> your
>>> >>>>>>>>> questions.
>>> >>>>>>>>>
>>> >>>>>>>>> In our work (Tempura), users specify the set of time points to
>>> >>>>>>> consider
>>> >>>>>>>>> running and a cost function that expresses users' preference
>>> over
>>> >>>>>>> time,
>>> >>>>>>>>> Tempura will generate the best incremental plan that minimizes
>>> the
>>> >>>>>>>> overall
>>> >>>>>>>>> cost function.
>>> >>>>>>>>>
>>> >>>>>>>>> In this incremental plan, the sub-plans at different time
>>> points
>>> >>> can
>>> >>>>>>> be
>>> >>>>>>>>> different from each other, as opposed to identical plans in all
>>> >>> delta
>>> >>>>>>>> runs
>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
>>> paper,
>>> >>> we
>>> >>>>>>> can
>>> >>>>>>>>> mimic the current streaming implementation by specifying two
>>> >>> (logical)
>>> >>>>>>>> time
>>> >>>>>>>>> points in Tempura, representing the initial run and later delta
>>> >>> runs
>>> >>>>>>>>> respectively. In general, note that Tempura supports various
>>> form
>>> >>> of
>>> >>>>>>>>> incremental computing, not only the small-delta append-only
>>> data
>>> >>>>>>> model in
>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes the
>>> >>> current
>>> >>>>>>>>> streaming support, as well as any IVM implementations.
>>> >>>>>>>>>
>>> >>>>>>>>> About the cost model, we did not come up with a seperate cost
>>> >>> model,
>>> >>>>>>> but
>>> >>>>>>>>> rather extended the existing one. Similar to multi-objective
>>> >>>>>>>> optimization,
>>> >>>>>>>>> costs incurred at different time points are considered
>>> different
>>> >>>>>>>>> dimensions. Tempura lets users supply a function that converts
>>> this
>>> >>>>>>> cost
>>> >>>>>>>>> vector into a final cost. So under this function, any two
>>> >>> incremental
>>> >>>>>>>> plans
>>> >>>>>>>>> are still comparable and there is an overall optimum. I guess
>>> we
>>> >>> can
>>> >>>>>>> go
>>> >>>>>>>>> down the route of multi-objective parametric query optimization
>>> >>>>>>> instead
>>> >>>>>>>> if
>>> >>>>>>>>> there is a need.
>>> >>>>>>>>>
>>> >>>>>>>>> Next on materialized views and multi-query optimization, since
>>> our
>>> >>>>>>>>> multi-time-point plan naturally involves materializing
>>> intermediate
>>> >>>>>>>> results
>>> >>>>>>>>> for later time points, we need to solve the problem of choosing
>>> >>>>>>>>> materializations and include the cost of saving and reusing the
>>> >>>>>>>>> materializations when costing and comparing plans. We borrowed
>>> the
>>> >>>>>>>>> multi-query optimization techniques to solve this problem even
>>> >>> though
>>> >>>>>>> we
>>> >>>>>>>>> are looking at a single query. As a result, we think our work
>>> is
>>> >>>>>>>> orthogonal
>>> >>>>>>>>> to Calcite's facilities around utilizing existing views,
>>> lattice
>>> >>> etc.
>>> >>>>>>> We
>>> >>>>>>>> do
>>> >>>>>>>>> feel that the multi-query optimization component can be
>>> adopted to
>>> >>>>>>> wider
>>> >>>>>>>>> use, but probably need more suggestions from the community.
>>> >>>>>>>>>
>>> >>>>>>>>> Lastly, our current implementation is set up in java code, it
>>> >>> should
>>> >>>>>>> be
>>> >>>>>>>>> straightforward to hook it up with SQL shell.
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks,
>>> >>>>>>>>> Botong
>>> >>>>>>>>>
>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>>> >>> jhyde.apache@gmail.com>
>>> >>>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> Botong,
>>> >>>>>>>>>>
>>> >>>>>>>>>> This is very exciting; congratulations on this research, and
>>> thank
>>> >>>>>>> you
>>> >>>>>>>>>> for contributing it back to Calcite.
>>> >>>>>>>>>>
>>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
>>> >>>>>>> materialized
>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
>>> already
>>> >>>>>>> some
>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational
>>> operators,
>>> >>>>>>> lattice,
>>> >>>>>>>>>> and Spool operator), it will be interesting to see whether we
>>> can
>>> >>>>>>> make
>>> >>>>>>>> them
>>> >>>>>>>>>> compatible, or whether one concept can subsume others.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Your work differs from streaming queries in that your
>>> relations
>>> >>> are
>>> >>>>>>> used
>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming
>>> queries, the
>>> >>>>>>> only
>>> >>>>>>>>>> activity is the change propagation. Did you find that you
>>> needed
>>> >>> two
>>> >>>>>>>>>> separate cost models - one for “view maintenance” and another
>>> for
>>> >>>>>>> “user
>>> >>>>>>>>>> queries” - since the objectives of each activity are so
>>> different?
>>> >>>>>>>>>>
>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
>>> >>> multi-objective
>>> >>>>>>>>>> parametric query optimization [1] in Calcite.
>>> >>>>>>>>>>
>>> >>>>>>>>>> I will make time over the next few days to read and digest
>>> your
>>> >>>>>>> paper.
>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
>>> create
>>> >>>>>>>>>> something that will be useful for the broader community.
>>> >>>>>>>>>>
>>> >>>>>>>>>> One thing will be particularly useful: making this
>>> functionality
>>> >>>>>>>>>> available from a SQL shell, so that people can experiment with
>>> >>> this
>>> >>>>>>>>>> functionality without writing Java code or setting up complex
>>> >>>>>>> databases
>>> >>>>>>>> and
>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
>>> operations
>>> >>>>>>> that
>>> >>>>>>>> are
>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
>>> could
>>> >>>>>>> devise
>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Julian
>>> >>>>>>>>>>
>>> >>>>>>>>>> [1]
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>
>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <pk...@gmail.com>
>>> >>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, please
>>> >>> refer
>>> >>>>>>> to
>>> >>>>>>>>>> Fig
>>> >>>>>>>>>>> 3(a) in our paper:
>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Best,
>>> >>>>>>>>>>> Botong
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>>> taojiatao@gmail.com>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail, may
>>> you
>>> >>>>>>> open
>>> >>>>>>>> a
>>> >>>>>>>>>> JIRA
>>> >>>>>>>>>>>> for this, people who are interested in this can subscribe
>>> to the
>>> >>>>>>>> JIRA?
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Regards!
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Aron Tao
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四 上午3:18写道:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>> Hi all,
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
>>> >>> general
>>> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
>>> >>>>>>> published
>>> >>>>>>>> in
>>> >>>>>>>>>>>> VLDB
>>> >>>>>>>>>>>>> 2021:
>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
>>> >>> incremental
>>> >>>>>>>> data
>>> >>>>>>>>>>>>> processing
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
>>> Alibaba’s
>>> >>>>>>> data
>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
>>> optimizer
>>> >>> to
>>> >>>>>>>>>>>> alleviate
>>> >>>>>>>>>>>>> cluster-wise resource skewness:
>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
>>> >>> Incremental
>>> >>>>>>>>>>>> Computing
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> To our best knowledge, this is the first general cost-based
>>> >>>>>>>>>> incremental
>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
>>> families
>>> >>> of
>>> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
>>> >>>>>>> DBToaster,
>>> >>>>>>>>>> etc.
>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best
>>> plan
>>> >>> is
>>> >>>>>>>>>>>>> consistently much better than the plans from each
>>> individual
>>> >>>>>>> method
>>> >>>>>>>>>>>> alone.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> In general, incremental query planning is central to
>>> database
>>> >>>>>>> view
>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
>>> >>> adopted
>>> >>>>>>> in
>>> >>>>>>>>>>>> active
>>> >>>>>>>>>>>>> databases, resumable query execution, approximate query
>>> >>>>>>> processing,
>>> >>>>>>>>>> etc.
>>> >>>>>>>>>>>> We
>>> >>>>>>>>>>>>> are hoping that this feature can help widening the
>>> spectrum of
>>> >>>>>>>>>> Calcite,
>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Below is a brief description of the technical details.
>>> Please
>>> >>>>>>> refer
>>> >>>>>>>> to
>>> >>>>>>>>>>>> the
>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
>>> >>> journal
>>> >>>>>>>>>> version
>>> >>>>>>>>>>>> of
>>> >>>>>>>>>>>>> the paper with more implementation details.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant to
>>> be
>>> >>>>>>>> executed
>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will be
>>> >>>>>>> extended
>>> >>>>>>>>>> with
>>> >>>>>>>>>>>>> temporal information so that it is capable of generating
>>> >>>>>>> incremental
>>> >>>>>>>>>>>> plans
>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at different
>>> time
>>> >>>>>>> points.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> The main idea is to view each table as one that changes
>>> over
>>> >>> time
>>> >>>>>>>>>> (Time
>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
>>> >>>>>>> TvrMetaSet
>>> >>>>>>>>>> into
>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track
>>> related
>>> >>>>>>> RelSets
>>> >>>>>>>>>> of a
>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain time,
>>> >>>>>>> delta of
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>>> table between two time points, etc.).
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> [image: image.png]
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
>>> >>>>>>> TvrMetaSet
>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
>>> >>> Horizontal
>>> >>>>>>>> lines
>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet.
>>> Users
>>> >>> can
>>> >>>>>>>>>> write
>>> >>>>>>>>>>>> TVR
>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between
>>> these
>>> >>>>>>> dots.
>>> >>>>>>>>>> For
>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that describe
>>> how
>>> >>> to
>>> >>>>>>>>>> compute
>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The red
>>> >>> lines
>>> >>>>>>>> are
>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a
>>> TVR. All
>>> >>>>>>> TVR
>>> >>>>>>>>>>>> rewrite
>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules still
>>> work
>>> >>> in
>>> >>>>>>>> the
>>> >>>>>>>>>> new
>>> >>>>>>>>>>>>> volcano system without modification.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
>>> >>>>>>> RelNodes,
>>> >>>>>>>>>> as
>>> >>>>>>>>>>>>> well as links in between the nodes.
>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded rule
>>> >>>>>>> engine
>>> >>>>>>>>>> API.
>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
>>> incremental
>>> >>>>>>> plan
>>> >>>>>>>>>>>>> involving multiple time points.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and thus
>>> when
>>> >>>>>>>>>> disabled,
>>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
>>> >>>>>>>>>> Calcite-extended
>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
>>> called
>>> >>>>>>> the
>>> >>>>>>>>>>>> ‘‘range
>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
>>> savings
>>> >>> of
>>> >>>>>>> 80%
>>> >>>>>>>>>> on
>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
>>> >>> execution
>>> >>>>>>>>>> time.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
>>> >>>>>>> holidays!
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Best,
>>> >>>>>>>>>>>>> Botong
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> --
>>> >>>>>>> ~~~~~~~~~~~~~~~
>>> >>>>>>> no mistakes
>>> >>>>>>> ~~~~~~~~~~~~~~~~~~
>>> >>>>>>>
>>> >>>>>>
>>> >>>
>>> >>
>>>
>>>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi all,

According to the preferences collected, we are tentatively scheduling our
meeting at 9pm-10:30pm PST on 04/26 Monday.

We will give a presentation about Tempura, followed by a free discussion.

Please let us know if there are new other requests. Few days before
the meeting, I will send out a zoom meeting link.

Thanks,
Botong

On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pk...@gmail.com> wrote:

> Hi Julian and all,
>
> We've posted the Tempura code base below. Feel free to take a quick peek
> at the last five commits.
> https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
>
> I've also opened a Jira (CALCITE-4568
> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will serve
> as the umbrella Jira for the feature.
>
> In the meantime, we encourage everyone to enter the time preferences for
> our first meeting here:
>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>
> Thanks,
> Botong
>
> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jh...@gmail.com> wrote:
>
>> I have added my time preferences to the doc.
>>
>> Before we meet, could you publish a PR for us to review?
>>
>> Initial discussions will need to be about architecture and high-level
>> design. So I would ask Calcite reviewers not to review the PR line-by-line
>> (or to leave comments in GitHub) but try to understand the design
>> holistically, and prepare questions/comments before the meeting.
>>
>> Botong, Can you please create a Calcite JIRA case for this task? JIRA how
>> we track long-running tasks such as this.
>>
>> Julian
>>
>>
>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > Apology for the delay. It took us some time to clean up our code base
>> and
>> > publicly release it (which will be out soon) for a quick peek.
>> >
>> > We are ready to present our work. Let's schedule a time for a Zoom
>> > meeting and discuss how to integrate Tempura into Calcite.
>> >
>> > Since some of our team members are in China, we prefer the time slot of
>> > 7:00pm-11:30pm PST any day. I've added our time preference in the shared
>> > doc below.
>> >
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >
>> > We encourage everyone to add their time preferences (during
>> 04/15-04/30) in
>> > this doc. In a week or so, we will try to settle a time that works for
>> > most.
>> >
>> > Thanks,
>> > Botong
>> >
>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com> wrote:
>> >
>> >> Hi Julian and Rui,
>> >>
>> >> Sounds good to us. Please give us some time to prepare some slides for
>> the
>> >> meeting.
>> >>
>> >> I've created a doc below for discussion. Please feel free to add more
>> in
>> >> here:
>> >>
>> >>
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> >>
>> >> Thanks,
>> >> Botong
>> >>
>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <jh...@gmail.com>
>> >> wrote:
>> >>
>> >>> PS The “editable doc” that Rui refers to is also a good idea. I think
>> we
>> >>> should create it to continue discussion after the first meeting.
>> >>>
>> >>> Julian
>> >>>
>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <jh...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> I think good next steps would be a PR and a meeting. The PR will
>> allow
>> >>> us to read the code, but I think we should do the first round of
>> questions
>> >>> at the meeting.  The meeting could perhaps start with a presentation
>> of the
>> >>> paper (do you have some slides you are planning to present at VLDB,
>> >>> Botong?) and then move on to questions about the concepts, which
>> >>> alternatives were considered, and how the concepts map onto other
>> current
>> >>> and future concepts in calcite.
>> >>>>
>> >>>> I don’t think we should start “reviewing” the PR line-by-line at this
>> >>> point. We need to understand the high-level concepts and design
>> choices. If
>> >>> we start reviewing the PR we will get lost in the details.
>> >>>>
>> >>>> I know that integrating a major change is hard; I doubt that we will
>> be
>> >>> able to integrate everything, but we can build understanding about
>> where
>> >>> calcite needs to go, and I hope integrate a good amount of code to
>> help us
>> >>> get there.
>> >>>>
>> >>>> As I said before, after the integration I would like people to be
>> able
>> >>> to experiment with it and use it in their production systems.  That
>> way, it
>> >>> will not be an experiment that withers, but a feature set integrates
>> with
>> >>> other calcite features and gets stronger over time.
>> >>>>
>> >>>> Julian
>> >>>>
>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org>
>> wrote:
>> >>>>>
>> >>>>> For me to participate in the discussion for the above questions, I
>> >>> will
>> >>>>> need to read a lot more to know relevant context and likely ask
>> lots of
>> >>>>> questions :-).  A editable doc is probably good for questions and
>> back
>> >>> and
>> >>>>> forward discussion.
>> >>>>>
>> >>>>>
>> >>>>> -Rui
>> >>>>>
>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <am...@apache.org>
>> >>> wrote:
>> >>>>>>
>> >>>>>> I am also happy to help push this work into Calcite (review code
>> and
>> >>> doc,
>> >>>>>> etc.).
>> >>>>>>
>> >>>>>> While you can share your code so people can have more idea how it
>> is
>> >>>>>> implemented, I think it would be also nice to have a doc to discuss
>> >>> open
>> >>>>>> questions above. Some points that I copy those to here:
>> >>>>>>
>> >>>>>> 1. Can this solution be compatible with existing solutions in
>> Calcite
>> >>>>>> Streaming, materialized view maintenance, and multi-query
>> optimization
>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool
>> operator),
>> >>>>>> 2. Did you find that you needed two separate cost models - one for
>> >>> “view
>> >>>>>> maintenance” and another for “user queries” - since the objectives
>> of
>> >>> each
>> >>>>>> activity are so different?
>> >>>>>> 3. whether this work will hasten the arrival of multi-objective
>> >>> parametric
>> >>>>>> query optimization [1] in Calcite.
>> >>>>>> 4. probably SQL shell support.
>> >>>>>>
>> >>>>>>
>> >>>>>> [1]:
>> >>>>>>
>> >>>
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> >>>>>>
>> >>>>>>
>> >>>>>> -Rui
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> it would be very nice to see a POC of your work.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <pk...@gmail.com>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi Julian,
>> >>>>>>>>
>> >>>>>>>> Just wondering if there are any updates? We are wondering if it
>> >>> would
>> >>>>>>> help
>> >>>>>>>> to post our code for a quick preview.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Botong
>> >>>>>>>>
>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <pk...@gmail.com>
>> >>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi Julian,
>> >>>>>>>>>
>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that best
>> >>>>>>> benefits
>> >>>>>>>>> the community. Here are some clarifications that hopefully
>> answer
>> >>> your
>> >>>>>>>>> questions.
>> >>>>>>>>>
>> >>>>>>>>> In our work (Tempura), users specify the set of time points to
>> >>>>>>> consider
>> >>>>>>>>> running and a cost function that expresses users' preference
>> over
>> >>>>>>> time,
>> >>>>>>>>> Tempura will generate the best incremental plan that minimizes
>> the
>> >>>>>>>> overall
>> >>>>>>>>> cost function.
>> >>>>>>>>>
>> >>>>>>>>> In this incremental plan, the sub-plans at different time points
>> >>> can
>> >>>>>>> be
>> >>>>>>>>> different from each other, as opposed to identical plans in all
>> >>> delta
>> >>>>>>>> runs
>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
>> paper,
>> >>> we
>> >>>>>>> can
>> >>>>>>>>> mimic the current streaming implementation by specifying two
>> >>> (logical)
>> >>>>>>>> time
>> >>>>>>>>> points in Tempura, representing the initial run and later delta
>> >>> runs
>> >>>>>>>>> respectively. In general, note that Tempura supports various
>> form
>> >>> of
>> >>>>>>>>> incremental computing, not only the small-delta append-only data
>> >>>>>>> model in
>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes the
>> >>> current
>> >>>>>>>>> streaming support, as well as any IVM implementations.
>> >>>>>>>>>
>> >>>>>>>>> About the cost model, we did not come up with a seperate cost
>> >>> model,
>> >>>>>>> but
>> >>>>>>>>> rather extended the existing one. Similar to multi-objective
>> >>>>>>>> optimization,
>> >>>>>>>>> costs incurred at different time points are considered different
>> >>>>>>>>> dimensions. Tempura lets users supply a function that converts
>> this
>> >>>>>>> cost
>> >>>>>>>>> vector into a final cost. So under this function, any two
>> >>> incremental
>> >>>>>>>> plans
>> >>>>>>>>> are still comparable and there is an overall optimum. I guess we
>> >>> can
>> >>>>>>> go
>> >>>>>>>>> down the route of multi-objective parametric query optimization
>> >>>>>>> instead
>> >>>>>>>> if
>> >>>>>>>>> there is a need.
>> >>>>>>>>>
>> >>>>>>>>> Next on materialized views and multi-query optimization, since
>> our
>> >>>>>>>>> multi-time-point plan naturally involves materializing
>> intermediate
>> >>>>>>>> results
>> >>>>>>>>> for later time points, we need to solve the problem of choosing
>> >>>>>>>>> materializations and include the cost of saving and reusing the
>> >>>>>>>>> materializations when costing and comparing plans. We borrowed
>> the
>> >>>>>>>>> multi-query optimization techniques to solve this problem even
>> >>> though
>> >>>>>>> we
>> >>>>>>>>> are looking at a single query. As a result, we think our work is
>> >>>>>>>> orthogonal
>> >>>>>>>>> to Calcite's facilities around utilizing existing views, lattice
>> >>> etc.
>> >>>>>>> We
>> >>>>>>>> do
>> >>>>>>>>> feel that the multi-query optimization component can be adopted
>> to
>> >>>>>>> wider
>> >>>>>>>>> use, but probably need more suggestions from the community.
>> >>>>>>>>>
>> >>>>>>>>> Lastly, our current implementation is set up in java code, it
>> >>> should
>> >>>>>>> be
>> >>>>>>>>> straightforward to hook it up with SQL shell.
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Botong
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>> >>> jhyde.apache@gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Botong,
>> >>>>>>>>>>
>> >>>>>>>>>> This is very exciting; congratulations on this research, and
>> thank
>> >>>>>>> you
>> >>>>>>>>>> for contributing it back to Calcite.
>> >>>>>>>>>>
>> >>>>>>>>>> The research touches several areas in Calcite: streaming,
>> >>>>>>> materialized
>> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
>> already
>> >>>>>>> some
>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational operators,
>> >>>>>>> lattice,
>> >>>>>>>>>> and Spool operator), it will be interesting to see whether we
>> can
>> >>>>>>> make
>> >>>>>>>> them
>> >>>>>>>>>> compatible, or whether one concept can subsume others.
>> >>>>>>>>>>
>> >>>>>>>>>> Your work differs from streaming queries in that your relations
>> >>> are
>> >>>>>>> used
>> >>>>>>>>>> by “external” user queries, whereas in pure streaming queries,
>> the
>> >>>>>>> only
>> >>>>>>>>>> activity is the change propagation. Did you find that you
>> needed
>> >>> two
>> >>>>>>>>>> separate cost models - one for “view maintenance” and another
>> for
>> >>>>>>> “user
>> >>>>>>>>>> queries” - since the objectives of each activity are so
>> different?
>> >>>>>>>>>>
>> >>>>>>>>>> I wonder whether this work will hasten the arrival of
>> >>> multi-objective
>> >>>>>>>>>> parametric query optimization [1] in Calcite.
>> >>>>>>>>>>
>> >>>>>>>>>> I will make time over the next few days to read and digest your
>> >>>>>>> paper.
>> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
>> create
>> >>>>>>>>>> something that will be useful for the broader community.
>> >>>>>>>>>>
>> >>>>>>>>>> One thing will be particularly useful: making this
>> functionality
>> >>>>>>>>>> available from a SQL shell, so that people can experiment with
>> >>> this
>> >>>>>>>>>> functionality without writing Java code or setting up complex
>> >>>>>>> databases
>> >>>>>>>> and
>> >>>>>>>>>> metadata. I have in mind something like the simple DDL
>> operations
>> >>>>>>> that
>> >>>>>>>> are
>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
>> could
>> >>>>>>> devise
>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>> >>>>>>>>>>
>> >>>>>>>>>> Julian
>> >>>>>>>>>>
>> >>>>>>>>>> [1]
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>
>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <pk...@gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, please
>> >>> refer
>> >>>>>>> to
>> >>>>>>>>>> Fig
>> >>>>>>>>>>> 3(a) in our paper:
>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>> Botong
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
>> taojiatao@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail, may
>> you
>> >>>>>>> open
>> >>>>>>>> a
>> >>>>>>>>>> JIRA
>> >>>>>>>>>>>> for this, people who are interested in this can subscribe to
>> the
>> >>>>>>>> JIRA?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regards!
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Aron Tao
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四 上午3:18写道:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
>> >>> general
>> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
>> >>>>>>> published
>> >>>>>>>> in
>> >>>>>>>>>>>> VLDB
>> >>>>>>>>>>>>> 2021:
>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
>> >>> incremental
>> >>>>>>>> data
>> >>>>>>>>>>>>> processing
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how
>> Alibaba’s
>> >>>>>>> data
>> >>>>>>>>>>>>> warehouse is planning to use this incremental query
>> optimizer
>> >>> to
>> >>>>>>>>>>>> alleviate
>> >>>>>>>>>>>>> cluster-wise resource skewness:
>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
>> >>> Incremental
>> >>>>>>>>>>>> Computing
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> To our best knowledge, this is the first general cost-based
>> >>>>>>>>>> incremental
>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
>> families
>> >>> of
>> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
>> >>>>>>> DBToaster,
>> >>>>>>>>>> etc.
>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best
>> plan
>> >>> is
>> >>>>>>>>>>>>> consistently much better than the plans from each individual
>> >>>>>>> method
>> >>>>>>>>>>>> alone.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> In general, incremental query planning is central to
>> database
>> >>>>>>> view
>> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
>> >>> adopted
>> >>>>>>> in
>> >>>>>>>>>>>> active
>> >>>>>>>>>>>>> databases, resumable query execution, approximate query
>> >>>>>>> processing,
>> >>>>>>>>>> etc.
>> >>>>>>>>>>>> We
>> >>>>>>>>>>>>> are hoping that this feature can help widening the spectrum
>> of
>> >>>>>>>>>> Calcite,
>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Below is a brief description of the technical details.
>> Please
>> >>>>>>> refer
>> >>>>>>>> to
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
>> >>> journal
>> >>>>>>>>>> version
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>>> the paper with more implementation details.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant to be
>> >>>>>>>> executed
>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will be
>> >>>>>>> extended
>> >>>>>>>>>> with
>> >>>>>>>>>>>>> temporal information so that it is capable of generating
>> >>>>>>> incremental
>> >>>>>>>>>>>> plans
>> >>>>>>>>>>>>> that include multiple sub-plans to execute at different time
>> >>>>>>> points.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The main idea is to view each table as one that changes over
>> >>> time
>> >>>>>>>>>> (Time
>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
>> >>>>>>> TvrMetaSet
>> >>>>>>>>>> into
>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track related
>> >>>>>>> RelSets
>> >>>>>>>>>> of a
>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain time,
>> >>>>>>> delta of
>> >>>>>>>>>> the
>> >>>>>>>>>>>>> table between two time points, etc.).
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> [image: image.png]
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
>> >>>>>>> TvrMetaSet
>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
>> >>> Horizontal
>> >>>>>>>> lines
>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet.
>> Users
>> >>> can
>> >>>>>>>>>> write
>> >>>>>>>>>>>> TVR
>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between
>> these
>> >>>>>>> dots.
>> >>>>>>>>>> For
>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that describe
>> how
>> >>> to
>> >>>>>>>>>> compute
>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The red
>> >>> lines
>> >>>>>>>> are
>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a TVR.
>> All
>> >>>>>>> TVR
>> >>>>>>>>>>>> rewrite
>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules still
>> work
>> >>> in
>> >>>>>>>> the
>> >>>>>>>>>> new
>> >>>>>>>>>>>>> volcano system without modification.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
>> >>>>>>> RelNodes,
>> >>>>>>>>>> as
>> >>>>>>>>>>>>> well as links in between the nodes.
>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded rule
>> >>>>>>> engine
>> >>>>>>>>>> API.
>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
>> incremental
>> >>>>>>> plan
>> >>>>>>>>>>>>> involving multiple time points.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Note that this feature is an extension in nature and thus
>> when
>> >>>>>>>>>> disabled,
>> >>>>>>>>>>>>> does not change any existing Calcite behavior.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
>> >>>>>>>>>> Calcite-extended
>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
>> called
>> >>>>>>> the
>> >>>>>>>>>>>> ‘‘range
>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost
>> savings
>> >>> of
>> >>>>>>> 80%
>> >>>>>>>>>> on
>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
>> >>> execution
>> >>>>>>>>>> time.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
>> >>>>>>> holidays!
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>> Botong
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> ~~~~~~~~~~~~~~~
>> >>>>>>> no mistakes
>> >>>>>>> ~~~~~~~~~~~~~~~~~~
>> >>>>>>>
>> >>>>>>
>> >>>
>> >>
>>
>>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Botong Huang <pk...@gmail.com>.
Hi Julian and all,

We've posted the Tempura code base below. Feel free to take a quick peek at
the last five commits.
https://github.com/alibaba/cost-based-incremental-optimizer/commits/main

I've also opened a Jira (CALCITE-4568
<https://issues.apache.org/jira/browse/CALCITE-4568>), which will serve as
the umbrella Jira for the feature.

In the meantime, we encourage everyone to enter the time preferences for
our first meeting here:
https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing

Thanks,
Botong

On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jh...@gmail.com> wrote:

> I have added my time preferences to the doc.
>
> Before we meet, could you publish a PR for us to review?
>
> Initial discussions will need to be about architecture and high-level
> design. So I would ask Calcite reviewers not to review the PR line-by-line
> (or to leave comments in GitHub) but try to understand the design
> holistically, and prepare questions/comments before the meeting.
>
> Botong, Can you please create a Calcite JIRA case for this task? JIRA how
> we track long-running tasks such as this.
>
> Julian
>
>
> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Apology for the delay. It took us some time to clean up our code base and
> > publicly release it (which will be out soon) for a quick peek.
> >
> > We are ready to present our work. Let's schedule a time for a Zoom
> > meeting and discuss how to integrate Tempura into Calcite.
> >
> > Since some of our team members are in China, we prefer the time slot of
> > 7:00pm-11:30pm PST any day. I've added our time preference in the shared
> > doc below.
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >
> > We encourage everyone to add their time preferences (during 04/15-04/30)
> in
> > this doc. In a week or so, we will try to settle a time that works for
> > most.
> >
> > Thanks,
> > Botong
> >
> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com> wrote:
> >
> >> Hi Julian and Rui,
> >>
> >> Sounds good to us. Please give us some time to prepare some slides for
> the
> >> meeting.
> >>
> >> I've created a doc below for discussion. Please feel free to add more in
> >> here:
> >>
> >>
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >>
> >> Thanks,
> >> Botong
> >>
> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <jh...@gmail.com>
> >> wrote:
> >>
> >>> PS The “editable doc” that Rui refers to is also a good idea. I think
> we
> >>> should create it to continue discussion after the first meeting.
> >>>
> >>> Julian
> >>>
> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <jh...@gmail.com>
> >>> wrote:
> >>>>
> >>>> I think good next steps would be a PR and a meeting. The PR will
> allow
> >>> us to read the code, but I think we should do the first round of
> questions
> >>> at the meeting.  The meeting could perhaps start with a presentation
> of the
> >>> paper (do you have some slides you are planning to present at VLDB,
> >>> Botong?) and then move on to questions about the concepts, which
> >>> alternatives were considered, and how the concepts map onto other
> current
> >>> and future concepts in calcite.
> >>>>
> >>>> I don’t think we should start “reviewing” the PR line-by-line at this
> >>> point. We need to understand the high-level concepts and design
> choices. If
> >>> we start reviewing the PR we will get lost in the details.
> >>>>
> >>>> I know that integrating a major change is hard; I doubt that we will
> be
> >>> able to integrate everything, but we can build understanding about
> where
> >>> calcite needs to go, and I hope integrate a good amount of code to
> help us
> >>> get there.
> >>>>
> >>>> As I said before, after the integration I would like people to be able
> >>> to experiment with it and use it in their production systems.  That
> way, it
> >>> will not be an experiment that withers, but a feature set integrates
> with
> >>> other calcite features and gets stronger over time.
> >>>>
> >>>> Julian
> >>>>
> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org> wrote:
> >>>>>
> >>>>> For me to participate in the discussion for the above questions, I
> >>> will
> >>>>> need to read a lot more to know relevant context and likely ask lots
> of
> >>>>> questions :-).  A editable doc is probably good for questions and
> back
> >>> and
> >>>>> forward discussion.
> >>>>>
> >>>>>
> >>>>> -Rui
> >>>>>
> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <am...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>> I am also happy to help push this work into Calcite (review code and
> >>> doc,
> >>>>>> etc.).
> >>>>>>
> >>>>>> While you can share your code so people can have more idea how it is
> >>>>>> implemented, I think it would be also nice to have a doc to discuss
> >>> open
> >>>>>> questions above. Some points that I copy those to here:
> >>>>>>
> >>>>>> 1. Can this solution be compatible with existing solutions in
> Calcite
> >>>>>> Streaming, materialized view maintenance, and multi-query
> optimization
> >>>>>> (Sigma and Delta relational operators, lattice, and Spool operator),
> >>>>>> 2. Did you find that you needed two separate cost models - one for
> >>> “view
> >>>>>> maintenance” and another for “user queries” - since the objectives
> of
> >>> each
> >>>>>> activity are so different?
> >>>>>> 3. whether this work will hasten the arrival of multi-objective
> >>> parametric
> >>>>>> query optimization [1] in Calcite.
> >>>>>> 4. probably SQL shell support.
> >>>>>>
> >>>>>>
> >>>>>> [1]:
> >>>>>>
> >>>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>>>>
> >>>>>>
> >>>>>> -Rui
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> it would be very nice to see a POC of your work.
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <pk...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi Julian,
> >>>>>>>>
> >>>>>>>> Just wondering if there are any updates? We are wondering if it
> >>> would
> >>>>>>> help
> >>>>>>>> to post our code for a quick preview.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Botong
> >>>>>>>>
> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <pk...@gmail.com>
> >>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Julian,
> >>>>>>>>>
> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan that best
> >>>>>>> benefits
> >>>>>>>>> the community. Here are some clarifications that hopefully answer
> >>> your
> >>>>>>>>> questions.
> >>>>>>>>>
> >>>>>>>>> In our work (Tempura), users specify the set of time points to
> >>>>>>> consider
> >>>>>>>>> running and a cost function that expresses users' preference over
> >>>>>>> time,
> >>>>>>>>> Tempura will generate the best incremental plan that minimizes
> the
> >>>>>>>> overall
> >>>>>>>>> cost function.
> >>>>>>>>>
> >>>>>>>>> In this incremental plan, the sub-plans at different time points
> >>> can
> >>>>>>> be
> >>>>>>>>> different from each other, as opposed to identical plans in all
> >>> delta
> >>>>>>>> runs
> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura
> paper,
> >>> we
> >>>>>>> can
> >>>>>>>>> mimic the current streaming implementation by specifying two
> >>> (logical)
> >>>>>>>> time
> >>>>>>>>> points in Tempura, representing the initial run and later delta
> >>> runs
> >>>>>>>>> respectively. In general, note that Tempura supports various form
> >>> of
> >>>>>>>>> incremental computing, not only the small-delta append-only data
> >>>>>>> model in
> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes the
> >>> current
> >>>>>>>>> streaming support, as well as any IVM implementations.
> >>>>>>>>>
> >>>>>>>>> About the cost model, we did not come up with a seperate cost
> >>> model,
> >>>>>>> but
> >>>>>>>>> rather extended the existing one. Similar to multi-objective
> >>>>>>>> optimization,
> >>>>>>>>> costs incurred at different time points are considered different
> >>>>>>>>> dimensions. Tempura lets users supply a function that converts
> this
> >>>>>>> cost
> >>>>>>>>> vector into a final cost. So under this function, any two
> >>> incremental
> >>>>>>>> plans
> >>>>>>>>> are still comparable and there is an overall optimum. I guess we
> >>> can
> >>>>>>> go
> >>>>>>>>> down the route of multi-objective parametric query optimization
> >>>>>>> instead
> >>>>>>>> if
> >>>>>>>>> there is a need.
> >>>>>>>>>
> >>>>>>>>> Next on materialized views and multi-query optimization, since
> our
> >>>>>>>>> multi-time-point plan naturally involves materializing
> intermediate
> >>>>>>>> results
> >>>>>>>>> for later time points, we need to solve the problem of choosing
> >>>>>>>>> materializations and include the cost of saving and reusing the
> >>>>>>>>> materializations when costing and comparing plans. We borrowed
> the
> >>>>>>>>> multi-query optimization techniques to solve this problem even
> >>> though
> >>>>>>> we
> >>>>>>>>> are looking at a single query. As a result, we think our work is
> >>>>>>>> orthogonal
> >>>>>>>>> to Calcite's facilities around utilizing existing views, lattice
> >>> etc.
> >>>>>>> We
> >>>>>>>> do
> >>>>>>>>> feel that the multi-query optimization component can be adopted
> to
> >>>>>>> wider
> >>>>>>>>> use, but probably need more suggestions from the community.
> >>>>>>>>>
> >>>>>>>>> Lastly, our current implementation is set up in java code, it
> >>> should
> >>>>>>> be
> >>>>>>>>> straightforward to hook it up with SQL shell.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Botong
> >>>>>>>>>
> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
> >>> jhyde.apache@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Botong,
> >>>>>>>>>>
> >>>>>>>>>> This is very exciting; congratulations on this research, and
> thank
> >>>>>>> you
> >>>>>>>>>> for contributing it back to Calcite.
> >>>>>>>>>>
> >>>>>>>>>> The research touches several areas in Calcite: streaming,
> >>>>>>> materialized
> >>>>>>>>>> view maintenance, and multi-query optimization. As we have
> already
> >>>>>>> some
> >>>>>>>>>> solutions in those areas (Sigma and Delta relational operators,
> >>>>>>> lattice,
> >>>>>>>>>> and Spool operator), it will be interesting to see whether we
> can
> >>>>>>> make
> >>>>>>>> them
> >>>>>>>>>> compatible, or whether one concept can subsume others.
> >>>>>>>>>>
> >>>>>>>>>> Your work differs from streaming queries in that your relations
> >>> are
> >>>>>>> used
> >>>>>>>>>> by “external” user queries, whereas in pure streaming queries,
> the
> >>>>>>> only
> >>>>>>>>>> activity is the change propagation. Did you find that you needed
> >>> two
> >>>>>>>>>> separate cost models - one for “view maintenance” and another
> for
> >>>>>>> “user
> >>>>>>>>>> queries” - since the objectives of each activity are so
> different?
> >>>>>>>>>>
> >>>>>>>>>> I wonder whether this work will hasten the arrival of
> >>> multi-objective
> >>>>>>>>>> parametric query optimization [1] in Calcite.
> >>>>>>>>>>
> >>>>>>>>>> I will make time over the next few days to read and digest your
> >>>>>>> paper.
> >>>>>>>>>> Then I expect that we will have a back-and-forth process to
> create
> >>>>>>>>>> something that will be useful for the broader community.
> >>>>>>>>>>
> >>>>>>>>>> One thing will be particularly useful: making this functionality
> >>>>>>>>>> available from a SQL shell, so that people can experiment with
> >>> this
> >>>>>>>>>> functionality without writing Java code or setting up complex
> >>>>>>> databases
> >>>>>>>> and
> >>>>>>>>>> metadata. I have in mind something like the simple DDL
> operations
> >>>>>>> that
> >>>>>>>> are
> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we
> could
> >>>>>>> devise
> >>>>>>>>>> some kind of SQL syntax for a “multi-query”.
> >>>>>>>>>>
> >>>>>>>>>> Julian
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>
> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <pk...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, please
> >>> refer
> >>>>>>> to
> >>>>>>>>>> Fig
> >>>>>>>>>>> 3(a) in our paper:
> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Botong
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <
> taojiatao@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail, may
> you
> >>>>>>> open
> >>>>>>>> a
> >>>>>>>>>> JIRA
> >>>>>>>>>>>> for this, people who are interested in this can subscribe to
> the
> >>>>>>>> JIRA?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Aron Tao
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四 上午3:18写道:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
> >>> general
> >>>>>>>>>>>>> incremental query optimizer, based on our research paper
> >>>>>>> published
> >>>>>>>> in
> >>>>>>>>>>>> VLDB
> >>>>>>>>>>>>> 2021:
> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
> >>> incremental
> >>>>>>>> data
> >>>>>>>>>>>>> processing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how Alibaba’s
> >>>>>>> data
> >>>>>>>>>>>>> warehouse is planning to use this incremental query optimizer
> >>> to
> >>>>>>>>>>>> alleviate
> >>>>>>>>>>>>> cluster-wise resource skewness:
> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
> >>> Incremental
> >>>>>>>>>>>> Computing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> To our best knowledge, this is the first general cost-based
> >>>>>>>>>> incremental
> >>>>>>>>>>>>> optimizer that can find the best plan across multiple
> families
> >>> of
> >>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
> >>>>>>> DBToaster,
> >>>>>>>>>> etc.
> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated best plan
> >>> is
> >>>>>>>>>>>>> consistently much better than the plans from each individual
> >>>>>>> method
> >>>>>>>>>>>> alone.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In general, incremental query planning is central to database
> >>>>>>> view
> >>>>>>>>>>>>> maintenance and stream processing systems, and are being
> >>> adopted
> >>>>>>> in
> >>>>>>>>>>>> active
> >>>>>>>>>>>>> databases, resumable query execution, approximate query
> >>>>>>> processing,
> >>>>>>>>>> etc.
> >>>>>>>>>>>> We
> >>>>>>>>>>>>> are hoping that this feature can help widening the spectrum
> of
> >>>>>>>>>> Calcite,
> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Below is a brief description of the technical details. Please
> >>>>>>> refer
> >>>>>>>> to
> >>>>>>>>>>>> the
> >>>>>>>>>>>>> Tempura paper for more details. We are also working on a
> >>> journal
> >>>>>>>>>> version
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> the paper with more implementation details.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant to be
> >>>>>>>> executed
> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will be
> >>>>>>> extended
> >>>>>>>>>> with
> >>>>>>>>>>>>> temporal information so that it is capable of generating
> >>>>>>> incremental
> >>>>>>>>>>>> plans
> >>>>>>>>>>>>> that include multiple sub-plans to execute at different time
> >>>>>>> points.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The main idea is to view each table as one that changes over
> >>> time
> >>>>>>>>>> (Time
> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
> >>>>>>> TvrMetaSet
> >>>>>>>>>> into
> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track related
> >>>>>>> RelSets
> >>>>>>>>>> of a
> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain time,
> >>>>>>> delta of
> >>>>>>>>>> the
> >>>>>>>>>>>>> table between two time points, etc.).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> For example in the above figure, each vertical line is a
> >>>>>>> TvrMetaSet
> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
> >>> Horizontal
> >>>>>>>> lines
> >>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet. Users
> >>> can
> >>>>>>>>>> write
> >>>>>>>>>>>> TVR
> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations between these
> >>>>>>> dots.
> >>>>>>>>>> For
> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that describe
> how
> >>> to
> >>>>>>>>>> compute
> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The red
> >>> lines
> >>>>>>>> are
> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a TVR.
> All
> >>>>>>> TVR
> >>>>>>>>>>>> rewrite
> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules still
> work
> >>> in
> >>>>>>>> the
> >>>>>>>>>> new
> >>>>>>>>>>>>> volcano system without modification.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> All changes in this feature will consist of four parts:
> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
> >>>>>>> RelNodes,
> >>>>>>>>>> as
> >>>>>>>>>>>>> well as links in between the nodes.
> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded rule
> >>>>>>> engine
> >>>>>>>>>> API.
> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best
> incremental
> >>>>>>> plan
> >>>>>>>>>>>>> involving multiple time points.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Note that this feature is an extension in nature and thus
> when
> >>>>>>>>>> disabled,
> >>>>>>>>>>>>> does not change any existing Calcite behavior.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
> >>>>>>>>>> Calcite-extended
> >>>>>>>>>>>>> incremental query optimizer to a type of periodic query
> called
> >>>>>>> the
> >>>>>>>>>>>> ‘‘range
> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost savings
> >>> of
> >>>>>>> 80%
> >>>>>>>>>> on
> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
> >>> execution
> >>>>>>>>>> time.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
> >>>>>>> holidays!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Botong
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> ~~~~~~~~~~~~~~~
> >>>>>>> no mistakes
> >>>>>>> ~~~~~~~~~~~~~~~~~~
> >>>>>>>
> >>>>>>
> >>>
> >>
>
>

Re: Proposal to extend Calcite into a incremental query optimizer

Posted by Julian Hyde <jh...@gmail.com>.
I have added my time preferences to the doc.

Before we meet, could you publish a PR for us to review?

Initial discussions will need to be about architecture and high-level design. So I would ask Calcite reviewers not to review the PR line-by-line (or to leave comments in GitHub) but try to understand the design holistically, and prepare questions/comments before the meeting.

Botong, Can you please create a Calcite JIRA case for this task? JIRA how we track long-running tasks such as this.

Julian


> On Apr 3, 2021, at 5:15 PM, Botong Huang <pk...@gmail.com> wrote:
> 
> Hi all,
> 
> Apology for the delay. It took us some time to clean up our code base and
> publicly release it (which will be out soon) for a quick peek.
> 
> We are ready to present our work. Let's schedule a time for a Zoom
> meeting and discuss how to integrate Tempura into Calcite.
> 
> Since some of our team members are in China, we prefer the time slot of
> 7:00pm-11:30pm PST any day. I've added our time preference in the shared
> doc below.
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> 
> We encourage everyone to add their time preferences (during 04/15-04/30) in
> this doc. In a week or so, we will try to settle a time that works for
> most.
> 
> Thanks,
> Botong
> 
> On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pk...@gmail.com> wrote:
> 
>> Hi Julian and Rui,
>> 
>> Sounds good to us. Please give us some time to prepare some slides for the
>> meeting.
>> 
>> I've created a doc below for discussion. Please feel free to add more in
>> here:
>> 
>> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
>> 
>> Thanks,
>> Botong
>> 
>> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde <jh...@gmail.com>
>> wrote:
>> 
>>> PS The “editable doc” that Rui refers to is also a good idea. I think we
>>> should create it to continue discussion after the first meeting.
>>> 
>>> Julian
>>> 
>>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde <jh...@gmail.com>
>>> wrote:
>>>> 
>>>> I think good next steps would be a PR and a meeting. The PR will allow
>>> us to read the code, but I think we should do the first round of questions
>>> at the meeting.  The meeting could perhaps start with a presentation of the
>>> paper (do you have some slides you are planning to present at VLDB,
>>> Botong?) and then move on to questions about the concepts, which
>>> alternatives were considered, and how the concepts map onto other current
>>> and future concepts in calcite.
>>>> 
>>>> I don’t think we should start “reviewing” the PR line-by-line at this
>>> point. We need to understand the high-level concepts and design choices. If
>>> we start reviewing the PR we will get lost in the details.
>>>> 
>>>> I know that integrating a major change is hard; I doubt that we will be
>>> able to integrate everything, but we can build understanding about where
>>> calcite needs to go, and I hope integrate a good amount of code to help us
>>> get there.
>>>> 
>>>> As I said before, after the integration I would like people to be able
>>> to experiment with it and use it in their production systems.  That way, it
>>> will not be an experiment that withers, but a feature set integrates with
>>> other calcite features and gets stronger over time.
>>>> 
>>>> Julian
>>>> 
>>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <am...@apache.org> wrote:
>>>>> 
>>>>> For me to participate in the discussion for the above questions, I
>>> will
>>>>> need to read a lot more to know relevant context and likely ask lots of
>>>>> questions :-).  A editable doc is probably good for questions and back
>>> and
>>>>> forward discussion.
>>>>> 
>>>>> 
>>>>> -Rui
>>>>> 
>>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang <am...@apache.org>
>>> wrote:
>>>>>> 
>>>>>> I am also happy to help push this work into Calcite (review code and
>>> doc,
>>>>>> etc.).
>>>>>> 
>>>>>> While you can share your code so people can have more idea how it is
>>>>>> implemented, I think it would be also nice to have a doc to discuss
>>> open
>>>>>> questions above. Some points that I copy those to here:
>>>>>> 
>>>>>> 1. Can this solution be compatible with existing solutions in Calcite
>>>>>> Streaming, materialized view maintenance, and multi-query optimization
>>>>>> (Sigma and Delta relational operators, lattice, and Spool operator),
>>>>>> 2. Did you find that you needed two separate cost models - one for
>>> “view
>>>>>> maintenance” and another for “user queries” - since the objectives of
>>> each
>>>>>> activity are so different?
>>>>>> 3. whether this work will hasten the arrival of multi-objective
>>> parametric
>>>>>> query optimization [1] in Calcite.
>>>>>> 4. probably SQL shell support.
>>>>>> 
>>>>>> 
>>>>>> [1]:
>>>>>> 
>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>>>> 
>>>>>> 
>>>>>> -Rui
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zi...@gmail.com> wrote:
>>>>>>> 
>>>>>>> it would be very nice to see a POC of your work.
>>>>>>> 
>>>>>>> 
>>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang <pk...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>>> Hi Julian,
>>>>>>>> 
>>>>>>>> Just wondering if there are any updates? We are wondering if it
>>> would
>>>>>>> help
>>>>>>>> to post our code for a quick preview.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Botong
>>>>>>>> 
>>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang <pk...@gmail.com>
>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Julian,
>>>>>>>>> 
>>>>>>>>> Thanks for your interest! Sure let's figure out a plan that best
>>>>>>> benefits
>>>>>>>>> the community. Here are some clarifications that hopefully answer
>>> your
>>>>>>>>> questions.
>>>>>>>>> 
>>>>>>>>> In our work (Tempura), users specify the set of time points to
>>>>>>> consider
>>>>>>>>> running and a cost function that expresses users' preference over
>>>>>>> time,
>>>>>>>>> Tempura will generate the best incremental plan that minimizes the
>>>>>>>> overall
>>>>>>>>> cost function.
>>>>>>>>> 
>>>>>>>>> In this incremental plan, the sub-plans at different time points
>>> can
>>>>>>> be
>>>>>>>>> different from each other, as opposed to identical plans in all
>>> delta
>>>>>>>> runs
>>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the Tempura paper,
>>> we
>>>>>>> can
>>>>>>>>> mimic the current streaming implementation by specifying two
>>> (logical)
>>>>>>>> time
>>>>>>>>> points in Tempura, representing the initial run and later delta
>>> runs
>>>>>>>>> respectively. In general, note that Tempura supports various form
>>> of
>>>>>>>>> incremental computing, not only the small-delta append-only data
>>>>>>> model in
>>>>>>>>> streaming systems. That's why we believe Tempura subsumes the
>>> current
>>>>>>>>> streaming support, as well as any IVM implementations.
>>>>>>>>> 
>>>>>>>>> About the cost model, we did not come up with a seperate cost
>>> model,
>>>>>>> but
>>>>>>>>> rather extended the existing one. Similar to multi-objective
>>>>>>>> optimization,
>>>>>>>>> costs incurred at different time points are considered different
>>>>>>>>> dimensions. Tempura lets users supply a function that converts this
>>>>>>> cost
>>>>>>>>> vector into a final cost. So under this function, any two
>>> incremental
>>>>>>>> plans
>>>>>>>>> are still comparable and there is an overall optimum. I guess we
>>> can
>>>>>>> go
>>>>>>>>> down the route of multi-objective parametric query optimization
>>>>>>> instead
>>>>>>>> if
>>>>>>>>> there is a need.
>>>>>>>>> 
>>>>>>>>> Next on materialized views and multi-query optimization, since our
>>>>>>>>> multi-time-point plan naturally involves materializing intermediate
>>>>>>>> results
>>>>>>>>> for later time points, we need to solve the problem of choosing
>>>>>>>>> materializations and include the cost of saving and reusing the
>>>>>>>>> materializations when costing and comparing plans. We borrowed the
>>>>>>>>> multi-query optimization techniques to solve this problem even
>>> though
>>>>>>> we
>>>>>>>>> are looking at a single query. As a result, we think our work is
>>>>>>>> orthogonal
>>>>>>>>> to Calcite's facilities around utilizing existing views, lattice
>>> etc.
>>>>>>> We
>>>>>>>> do
>>>>>>>>> feel that the multi-query optimization component can be adopted to
>>>>>>> wider
>>>>>>>>> use, but probably need more suggestions from the community.
>>>>>>>>> 
>>>>>>>>> Lastly, our current implementation is set up in java code, it
>>> should
>>>>>>> be
>>>>>>>>> straightforward to hook it up with SQL shell.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Botong
>>>>>>>>> 
>>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde <
>>> jhyde.apache@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Botong,
>>>>>>>>>> 
>>>>>>>>>> This is very exciting; congratulations on this research, and thank
>>>>>>> you
>>>>>>>>>> for contributing it back to Calcite.
>>>>>>>>>> 
>>>>>>>>>> The research touches several areas in Calcite: streaming,
>>>>>>> materialized
>>>>>>>>>> view maintenance, and multi-query optimization. As we have already
>>>>>>> some
>>>>>>>>>> solutions in those areas (Sigma and Delta relational operators,
>>>>>>> lattice,
>>>>>>>>>> and Spool operator), it will be interesting to see whether we can
>>>>>>> make
>>>>>>>> them
>>>>>>>>>> compatible, or whether one concept can subsume others.
>>>>>>>>>> 
>>>>>>>>>> Your work differs from streaming queries in that your relations
>>> are
>>>>>>> used
>>>>>>>>>> by “external” user queries, whereas in pure streaming queries, the
>>>>>>> only
>>>>>>>>>> activity is the change propagation. Did you find that you needed
>>> two
>>>>>>>>>> separate cost models - one for “view maintenance” and another for
>>>>>>> “user
>>>>>>>>>> queries” - since the objectives of each activity are so different?
>>>>>>>>>> 
>>>>>>>>>> I wonder whether this work will hasten the arrival of
>>> multi-objective
>>>>>>>>>> parametric query optimization [1] in Calcite.
>>>>>>>>>> 
>>>>>>>>>> I will make time over the next few days to read and digest your
>>>>>>> paper.
>>>>>>>>>> Then I expect that we will have a back-and-forth process to create
>>>>>>>>>> something that will be useful for the broader community.
>>>>>>>>>> 
>>>>>>>>>> One thing will be particularly useful: making this functionality
>>>>>>>>>> available from a SQL shell, so that people can experiment with
>>> this
>>>>>>>>>> functionality without writing Java code or setting up complex
>>>>>>> databases
>>>>>>>> and
>>>>>>>>>> metadata. I have in mind something like the simple DDL operations
>>>>>>> that
>>>>>>>> are
>>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether we could
>>>>>>> devise
>>>>>>>>>> some kind of SQL syntax for a “multi-query”.
>>>>>>>>>> 
>>>>>>>>>> Julian
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>> https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang <pk...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, please
>>> refer
>>>>>>> to
>>>>>>>>>> Fig
>>>>>>>>>>> 3(a) in our paper:
>>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Botong
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <ta...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail, may you
>>>>>>> open
>>>>>>>> a
>>>>>>>>>> JIRA
>>>>>>>>>>>> for this, people who are interested in this can subscribe to the
>>>>>>>> JIRA?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards!
>>>>>>>>>>>> 
>>>>>>>>>>>> Aron Tao
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Botong Huang <bo...@apache.org> 于2020年12月24日周四 上午3:18写道:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer into a
>>> general
>>>>>>>>>>>>> incremental query optimizer, based on our research paper
>>>>>>> published
>>>>>>>> in
>>>>>>>>>>>> VLDB
>>>>>>>>>>>>> 2021:
>>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for
>>> incremental
>>>>>>>> data
>>>>>>>>>>>>> processing
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how Alibaba’s
>>>>>>> data
>>>>>>>>>>>>> warehouse is planning to use this incremental query optimizer
>>> to
>>>>>>>>>>>> alleviate
>>>>>>>>>>>>> cluster-wise resource skewness:
>>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware
>>> Incremental
>>>>>>>>>>>> Computing
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To our best knowledge, this is the first general cost-based
>>>>>>>>>> incremental
>>>>>>>>>>>>> optimizer that can find the best plan across multiple families
>>> of
>>>>>>>>>>>>> incremental computing methods, including IVM, Streaming,
>>>>>>> DBToaster,
>>>>>>>>>> etc.
>>>>>>>>>>>>> Experiments (in the paper) shows that the generated best plan
>>> is
>>>>>>>>>>>>> consistently much better than the plans from each individual
>>>>>>> method
>>>>>>>>>>>> alone.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In general, incremental query planning is central to database
>>>>>>> view
>>>>>>>>>>>>> maintenance and stream processing systems, and are being
>>> adopted
>>>>>>> in
>>>>>>>>>>>> active
>>>>>>>>>>>>> databases, resumable query execution, approximate query
>>>>>>> processing,
>>>>>>>>>> etc.
>>>>>>>>>>>> We
>>>>>>>>>>>>> are hoping that this feature can help widening the spectrum of
>>>>>>>>>> Calcite,
>>>>>>>>>>>>> solicit more use cases and adoption of Calcite.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Below is a brief description of the technical details. Please
>>>>>>> refer
>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>>> Tempura paper for more details. We are also working on a
>>> journal
>>>>>>>>>> version
>>>>>>>>>>>> of
>>>>>>>>>>>>> the paper with more implementation details.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Currently the query plan generated by Calcite is meant to be
>>>>>>>> executed
>>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo will be
>>>>>>> extended
>>>>>>>>>> with
>>>>>>>>>>>>> temporal information so that it is capable of generating
>>>>>>> incremental
>>>>>>>>>>>> plans
>>>>>>>>>>>>> that include multiple sub-plans to execute at different time
>>>>>>> points.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The main idea is to view each table as one that changes over
>>> time
>>>>>>>>>> (Time
>>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we introduced
>>>>>>> TvrMetaSet
>>>>>>>>>> into
>>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track related
>>>>>>> RelSets
>>>>>>>>>> of a
>>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain time,
>>>>>>> delta of
>>>>>>>>>> the
>>>>>>>>>>>>> table between two time points, etc.).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For example in the above figure, each vertical line is a
>>>>>>> TvrMetaSet
>>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.).
>>> Horizontal
>>>>>>>> lines
>>>>>>>>>>>>> represent time. Each black dot in the grid is a RelSet. Users
>>> can
>>>>>>>>>> write
>>>>>>>>>>>> TVR
>>>>>>>>>>>>> Rewrite Rules to describe valid transformations between these
>>>>>>> dots.
>>>>>>>>>> For
>>>>>>>>>>>>> example, the blues lines are inter-TVR rules that describe how
>>> to
>>>>>>>>>> compute
>>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. The red
>>> lines
>>>>>>>> are
>>>>>>>>>>>>> intra-TVR rules that describe transformations within a TVR. All
>>>>>>> TVR
>>>>>>>>>>>> rewrite
>>>>>>>>>>>>> rules are logical rules. All existing Calcite rules still work
>>> in
>>>>>>>> the
>>>>>>>>>> new
>>>>>>>>>>>>> volcano system without modification.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> All changes in this feature will consist of four parts:
>>>>>>>>>>>>> 1. Memo extension with TvrMetaSet
>>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet and
>>>>>>> RelNodes,
>>>>>>>>>> as
>>>>>>>>>>>>> well as links in between the nodes.
>>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded rule
>>>>>>> engine
>>>>>>>>>> API.
>>>>>>>>>>>>> 4. Multi-query optimization, used to find the best incremental
>>>>>>> plan
>>>>>>>>>>>>> involving multiple time points.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Note that this feature is an extension in nature and thus when
>>>>>>>>>> disabled,
>>>>>>>>>>>>> does not change any existing Calcite behavior.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Other than scenarios in the paper, we also applied this
>>>>>>>>>> Calcite-extended
>>>>>>>>>>>>> incremental query optimizer to a type of periodic query called
>>>>>>> the
>>>>>>>>>>>> ‘‘range
>>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost savings
>>> of
>>>>>>> 80%
>>>>>>>>>> on
>>>>>>>>>>>>> total CPU and memory consumption, and 60% on end-to-end
>>> execution
>>>>>>>>>> time.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and happy
>>>>>>> holidays!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Botong
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> ~~~~~~~~~~~~~~~
>>>>>>> no mistakes
>>>>>>> ~~~~~~~~~~~~~~~~~~
>>>>>>> 
>>>>>> 
>>> 
>>