You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Julian Hyde <jh...@gmail.com> on 2022/01/12 18:42:22 UTC

[DISCUSS] Writing good summaries for Jira cases

Hi all,

Can we discuss how we write summaries for Jira cases? In my opinion it’s really important, because summaries become commit messages, and commit messages become release notes, which is how most people figure out what is in Calcite. I spend a lot of my time working with people to write good summaries.

I’d like some feedback on whether this approach is useful. And to try to teach people how to do it for themselves.

Consider this case https://issues.apache.org/jira/browse/CALCITE-4983 <https://issues.apache.org/jira/browse/CALCITE-4983>. (I chose a still current case because it doesn’t yet have an ‘answer’.)

The current summary is

>  In SubstitutionVisitor's unifyAggregates, if Aggregate has
>  grouping sets, we need to handle the condition needs to pull up.

It describes the cause but it doesn’t describe the problem (or the symptoms the user sees).

If you take your car into your mechanic the cause is ‘Leaky gasket results in oil dripping onto hot manifold’ but the problem is ‘Smoke comes from hood when engine gets hot’. Do you agree that the second description is much more useful?

In this case, the author came up with an example:

> Here is an example:
>
> sql: select empid, deptno from emps group by grouping sets ((empid, deptno),(empid))
> mv: select empid, count(distinct deptno) from emps where empid>100
>   group by grouping sets ((empid, deptno), (empid)) 
>
> the result plan is:
>
>   LogicalCalc(expr#0..2=[{inputs}], deptno=[$t1], EXPR$1=[$t2])
>     LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]], EXPR$1=[COUNT(DISTINCT $1)])
>       EnumerableTableScan(table=[[hr, MV0]])  
>
> We can see that this plan doesn't handle the condition empid>100

I think it’s a great example. I especially like the last line, where the author pointed out what was wrong. I suggest the following summary:

> Incorrect plan for query that has GROUPING SETS and WHERE

Do you think the summary is more useful? Can it be improved?

Julian




Re: [DISCUSS] Writing good summaries for Jira cases

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi,

Let me start by saying that this discussion is more about guidelines rather
than rules which can make the job of certain people easier so such
discussions are important.

Personally, I prefer having the problem/symptoms in the summary of the
JIRA/commit. If the root cause fits (does not make the summary overly long)
then I might put it as well.

From my perspective this makes the life of end users, reviewers, and
newcomers to a project easier.

Let's consider for example the role of support engineers. This group of
people are using JIRA very often (maybe even more than developers) and
usually have to deal with a stack consisting of many separate projects.
When the customer reports a problem it's not easy to dive directly into the
code even if they know exactly which project is to blame.
Usually the first thing to do is search the JIRA for cases reporting
similar problems/symptoms.

For reviewers, it is important to understand what is the problem that we're
trying to fix and if that is not in the Jira their job becomes much more
difficult. When people put too much emphasis on the root cause or the
solution then it's hard to understand why the change is needed in the first
place. I've seen a lot of bug reports where people submit a PR (it might or
might not have a test) without really saying what the problem is. I don't
mind if this happens since as other people mentioned not everyone
works/thinks the same way but I will ask clarifications till I understand
what the problem is otherwise it's hard for me to review a part that
potentially I don't know much about.

For people new to the project, I would argue that it's much easier to
understand a case if it describes a problem/symptoms rather than the root
cause/solution. This is just from my personal experience when I dive into
new projects. Sometimes people start a JIRA/PR without much context by
proposing changes to some classes and I cannot possibly follow the
discussion unless I written those classes myself. On the other hand
everybody can understand expressions like "wrong results", "invalid plan",
"missing filter", "memory leak", "Exception when ...".

So far we really focused into bug reports but what's also important for the
summary to convey is the nature of the change (bug vs feature vs
refactoring vs perf improvements, etc.). This can be done if the words are
chosen correctly.

Best,
Stamatis

On Fri, Jan 14, 2022, 11:30 PM Jacques Nadeau <ja...@apache.org> wrote:

> >
> > Under the current process, the Jira subject becomes the commit message
>
>
> I didn't know this was perceived as a "rule" and I don't follow it (oops?).
> I also don't agree with it being a good idea. I typically put substantially
> more information in a commit message than a jira subject. A commit message
> also comes after resolution whereas a jira subject comes before resolution.
> I don't generally think it is a good idea to change a jira subject on
> commit as I find it confusing when you're actually tracking a jira. If I
> were to change the subject on commit, I would be inclined to change it to
> what I changed, not a symptom.
>
>
> > As a reviewer, am I within my rights to say ‘I’m not going to even read
> > your code until you tell me what problem you are trying to solve?’.
>
>
> If a commit message doesn't clearly state the purpose of the patch, I'd be
> completely comfortable asking the user to update the commit message to
> clarify the purpose of the patch. I definitely wouldn't say it like that
> quote does.
>
>
> > Given that we have far more contributions than active reviews, and that I
> > am one of the most prolific reviewers, I suspect that I am.
>
>
> I don't think this is a constructive statement. It presents an argument
> (likely accidental) that because you do more of this kind of work, your
> opinion matters more than others'. I don't agree with this sentiment. If
> anything, I think we should be doing things to increase others'
> contributions--potentially at the cost of some small fraction of yours.
> Having a bus factor of one is a disservice to the community. Don't take
> this as a lack of appreciation and awe of your amazing contributions, just
> an expression of discomfort around what effectively feels like "pulling
> rank".
>
> what about reviewers and end users?
>
>
> I agree that these stakeholders are important. That being said, I haven't
> heard a lot of end users' complaints about our jira subjects or commit
> messages. Did I miss this?
>

Re: [DISCUSS] Writing good summaries for Jira cases

Posted by Jacques Nadeau <ja...@apache.org>.
>
> Under the current process, the Jira subject becomes the commit message


I didn't know this was perceived as a "rule" and I don't follow it (oops?).
I also don't agree with it being a good idea. I typically put substantially
more information in a commit message than a jira subject. A commit message
also comes after resolution whereas a jira subject comes before resolution.
I don't generally think it is a good idea to change a jira subject on
commit as I find it confusing when you're actually tracking a jira. If I
were to change the subject on commit, I would be inclined to change it to
what I changed, not a symptom.


> As a reviewer, am I within my rights to say ‘I’m not going to even read
> your code until you tell me what problem you are trying to solve?’.


If a commit message doesn't clearly state the purpose of the patch, I'd be
completely comfortable asking the user to update the commit message to
clarify the purpose of the patch. I definitely wouldn't say it like that
quote does.


> Given that we have far more contributions than active reviews, and that I
> am one of the most prolific reviewers, I suspect that I am.


I don't think this is a constructive statement. It presents an argument
(likely accidental) that because you do more of this kind of work, your
opinion matters more than others'. I don't agree with this sentiment. If
anything, I think we should be doing things to increase others'
contributions--potentially at the cost of some small fraction of yours.
Having a bus factor of one is a disservice to the community. Don't take
this as a lack of appreciation and awe of your amazing contributions, just
an expression of discomfort around what effectively feels like "pulling
rank".

what about reviewers and end users?


I agree that these stakeholders are important. That being said, I haven't
heard a lot of end users' complaints about our jira subjects or commit
messages. Did I miss this?

Re: [DISCUSS] Writing good summaries for Jira cases

Posted by Julian Hyde <jh...@gmail.com>.
There’s been a lot of talk in this thread about what is convenient for developers. Like some developers saying that they wanted to be able to find a commit that they made several years ago.

But what about reviewers and end users?

Under the current process, the Jira subject becomes the commit message, the commit message becomes the release notes, and the release notes become the documentation.

As a reviewer, am I within my rights to say ‘I’m not going to even read your code until you tell me what problem you are trying to solve?’. Given that we have far more contributions than active reviews, and that I am one of the most prolific reviewers, I suspect that I am.

Julian
 

> On Jan 14, 2022, at 10:45 AM, Jacques Nadeau <ja...@apache.org> wrote:
> 
> I don't think we should have any rules about summaries. Per Vladimir Ozerov
> comments, being militant about something that is relatively subjective is a
> good way to dissuade high quality contribution. I definitely find that
> rules around such minor things distract from high quality discussion
> around meaty architectural and design topics. As long as the case in its
> entirety does an okay job of communicating cause and effect, let's not get
> caught up in where each is expressed. (Frankly, many of the issues at hand
> are so complex that I often find that if you really want to understand what
> is going on, you're better off reviewing the added test cases
> associated with a change.)
> 
> This actually reminds me of something else I've seen in some communities
> where there has been great sensitivity to the grammar, spelling, etc of
> issues written. I think we should also avoid chasing that pattern. It's an
> anglo-centric approach to open source. Contributions come from all over the
> world. If the ideas are communicated, let's not punish people who may speak
> English as a third or fourth language.
> 
> 
> 
> On Thu, Jan 13, 2022 at 7:20 PM Yanjing Wang <zh...@gmail.com>
> wrote:
> 
>> I agree with Vladimir Ozerov, When I was a newcomer to Calcite, I just used
>> its core part rather than framework, so I don't know some concepts. I don't
>> know what JDBC adapter really is,  such as CALCITE-4901, CALCITE-4740 I
>> reported before. I don't know why the title needs to add 'JDBC adapter '.
>> Another example is CALCITE-4683, I don't know how I could describe the
>> symptom, because it's just an online bad case, I can't summarize it well
>> before I found a more simple and representative case. Frankly, as newcomers
>> will be more, I think the community needs patience and a friendly guide to
>> make the summaries friendly to the most contributors. I appreciate Julian
>> Hyde's work very much, He found the summary problem and tried to resolve it
>> which gave me lots of trouble for a long time.
>> 
>> xiong duan <no...@gmail.com> 于2022年1月13日周四 19:56写道:
>> 
>>> Hi all,
>>>  I prefer to describe the symptoms the user sees. Because For developers
>>> and users both can know what happens when Calcite has a wrong result or
>>> unexpected exception. If the user encounters a problem, I guess he wants
>> to
>>> find the related symptoms summary in the git log, not the root cause. The
>>> root cause requires the user have to do some debug or know related
>>> implement in Calcite. I think this is not very friendly to describe
>>> the root cause. When we release a new Calcite version with use the
>>> description of the symptoms, I think the commit log is very friendly for
>>> users and followers to know what has been fixed or improved.
>>>   But for CALCITE-4983, I think  `Incorrect plan for query that has
>>> GROUPING SETS and WHERE` is not enough, Because only MV rewriting have
>>> wrong according to the description, not all queries. So I suggest `MV
>>> rewriting generates the incorrect plan for query that has GROUPING SETS
>> and
>>> WHERE`
>>>  In addition, I admit this forced me to change my natural way of
>> thinking
>>> and not every issue I can describe correctly. But I think this is good
>> for
>>> me.
>>> 
>>> Viliam Durina <vi...@hazelcast.com> 于2022年1月13日周四 16:11写道:
>>> 
>>>> Hi all,
>>>> 
>>>> I also prefer root cause over end effect, if it's known. The end effect
>>>> should definitely be mentioned in the body. I even tend to edit
>> summaries
>>>> when I start with the end effect and find out the root cause later.
>>>> 
>>>> I also prefer a more impersonal tone. In our example I prefer this:
>>>> 
>>>>> SubstitutionVisitor.unifyAggregates() doesn't handle conditions if
>>>> Aggregate
>>>>> has grouping sets
>>>> 
>>>> Viliam
>>>> 
>>>> On Wed, 12 Jan 2022 at 20:38, Vladimir Ozerov <pp...@gmail.com>
>>> wrote:
>>>> 
>>>>> Hi Julian,
>>>>> 
>>>>> In my opinion, both ways work well. People tend to think differently.
>>>> Some
>>>>> prefer symptoms, others - the root cause. I personally prefer the
>>> latter
>>>>> for the following reason. If I face a problem, I first try to debug
>> it
>>> on
>>>>> my own. The result of the analysis is usually some questionable
>>> behavior
>>>> in
>>>>> a specific part of the code. Once you find the problematic place, you
>>> can
>>>>> run a search in JIRA or Git log (class name, feature name, etc) and
>>> check
>>>>> whether somebody else faced a similar issue. The description
>> "Incorrect
>>>>> plan ..." is less likely to help me than more concrete "In
>>>>> SubstitutionVisitor ...". Especially, given that a single root cause
>>> may
>>>>> manifest in several ways. But I would like to stress out - it is a
>>> matter
>>>>> of personal habits and previous experience, not something that I
>>>>> expect others to follow.
>>>>> 
>>>>> In the past, I worked on the Apache Ignite project. We had a number
>> of
>>>>> contribution rules, such as "put a comma here", "set the proper
>>> component
>>>>> there", "write the comment in that way", etc. I was the one who
>>> actively
>>>>> enforced this for a may years, because it gave the feeling that
>>>> everything
>>>>> is "put in order". Eventually, I came to the conclusion that this
>> does
>>>> more
>>>>> harm than good, because I regularly observed confusion and
>>>> dissatisfaction
>>>>> of the new contributors (and Apache Ignite community is far less
>>> diverse
>>>>> and active than in Apache Calcite), as they were forced to change
>> their
>>>>> natural way of thinking or past habits to engage with the community.
>>>>> 
>>>>> Regards,
>>>>> Vladimir.
>>>>> 
>>>>> ср, 12 янв. 2022 г. в 21:42, Julian Hyde <jh...@gmail.com>:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Can we discuss how we write summaries for Jira cases? In my opinion
>>>> it’s
>>>>>> really important, because summaries become commit messages, and
>>> commit
>>>>>> messages become release notes, which is how most people figure out
>>> what
>>>>> is
>>>>>> in Calcite. I spend a lot of my time working with people to write
>>> good
>>>>>> summaries.
>>>>>> 
>>>>>> I’d like some feedback on whether this approach is useful. And to
>> try
>>>> to
>>>>>> teach people how to do it for themselves.
>>>>>> 
>>>>>> Consider this case
>>> https://issues.apache.org/jira/browse/CALCITE-4983
>>>> <
>>>>>> https://issues.apache.org/jira/browse/CALCITE-4983>. (I chose a
>>> still
>>>>>> current case because it doesn’t yet have an ‘answer’.)
>>>>>> 
>>>>>> The current summary is
>>>>>> 
>>>>>>> In SubstitutionVisitor's unifyAggregates, if Aggregate has
>>>>>>> grouping sets, we need to handle the condition needs to pull up.
>>>>>> 
>>>>>> It describes the cause but it doesn’t describe the problem (or the
>>>>>> symptoms the user sees).
>>>>>> 
>>>>>> If you take your car into your mechanic the cause is ‘Leaky gasket
>>>>> results
>>>>>> in oil dripping onto hot manifold’ but the problem is ‘Smoke comes
>>> from
>>>>>> hood when engine gets hot’. Do you agree that the second
>> description
>>> is
>>>>>> much more useful?
>>>>>> 
>>>>>> In this case, the author came up with an example:
>>>>>> 
>>>>>>> Here is an example:
>>>>>>> 
>>>>>>> sql: select empid, deptno from emps group by grouping sets
>> ((empid,
>>>>>> deptno),(empid))
>>>>>>> mv: select empid, count(distinct deptno) from emps where
>> empid>100
>>>>>>>  group by grouping sets ((empid, deptno), (empid))
>>>>>>> 
>>>>>>> the result plan is:
>>>>>>> 
>>>>>>>  LogicalCalc(expr#0..2=[{inputs}], deptno=[$t1], EXPR$1=[$t2])
>>>>>>>    LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]],
>>>>>> EXPR$1=[COUNT(DISTINCT $1)])
>>>>>>>      EnumerableTableScan(table=[[hr, MV0]])
>>>>>>> 
>>>>>>> We can see that this plan doesn't handle the condition empid>100
>>>>>> 
>>>>>> I think it’s a great example. I especially like the last line,
>> where
>>>> the
>>>>>> author pointed out what was wrong. I suggest the following summary:
>>>>>> 
>>>>>>> Incorrect plan for query that has GROUPING SETS and WHERE
>>>>>> 
>>>>>> Do you think the summary is more useful? Can it be improved?
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> This message contains confidential information and is intended only for
>>>> the
>>>> individuals named. If you are not the named addressee you should not
>>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>>> immediately by e-mail if you have received this e-mail by mistake and
>>>> delete this e-mail from your system. E-mail transmission cannot be
>>>> guaranteed to be secure or error-free as information could be
>>> intercepted,
>>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>>> viruses.
>>>> The sender therefore does not accept liability for any errors or
>>> omissions
>>>> in the contents of this message, which arise as a result of e-mail
>>>> transmission. If verification is required, please request a hard-copy
>>>> version. -Hazelcast
>>>> 
>>> 
>> 


Re: [DISCUSS] Writing good summaries for Jira cases

Posted by Jacques Nadeau <ja...@apache.org>.
I don't think we should have any rules about summaries. Per Vladimir Ozerov
comments, being militant about something that is relatively subjective is a
good way to dissuade high quality contribution. I definitely find that
rules around such minor things distract from high quality discussion
around meaty architectural and design topics. As long as the case in its
entirety does an okay job of communicating cause and effect, let's not get
caught up in where each is expressed. (Frankly, many of the issues at hand
are so complex that I often find that if you really want to understand what
is going on, you're better off reviewing the added test cases
associated with a change.)

This actually reminds me of something else I've seen in some communities
where there has been great sensitivity to the grammar, spelling, etc of
issues written. I think we should also avoid chasing that pattern. It's an
anglo-centric approach to open source. Contributions come from all over the
world. If the ideas are communicated, let's not punish people who may speak
English as a third or fourth language.



On Thu, Jan 13, 2022 at 7:20 PM Yanjing Wang <zh...@gmail.com>
wrote:

> I agree with Vladimir Ozerov, When I was a newcomer to Calcite, I just used
> its core part rather than framework, so I don't know some concepts. I don't
> know what JDBC adapter really is,  such as CALCITE-4901, CALCITE-4740 I
> reported before. I don't know why the title needs to add 'JDBC adapter '.
> Another example is CALCITE-4683, I don't know how I could describe the
> symptom, because it's just an online bad case, I can't summarize it well
> before I found a more simple and representative case. Frankly, as newcomers
> will be more, I think the community needs patience and a friendly guide to
> make the summaries friendly to the most contributors. I appreciate Julian
> Hyde's work very much, He found the summary problem and tried to resolve it
> which gave me lots of trouble for a long time.
>
> xiong duan <no...@gmail.com> 于2022年1月13日周四 19:56写道:
>
> > Hi all,
> >   I prefer to describe the symptoms the user sees. Because For developers
> > and users both can know what happens when Calcite has a wrong result or
> > unexpected exception. If the user encounters a problem, I guess he wants
> to
> > find the related symptoms summary in the git log, not the root cause. The
> > root cause requires the user have to do some debug or know related
> > implement in Calcite. I think this is not very friendly to describe
> > the root cause. When we release a new Calcite version with use the
> > description of the symptoms, I think the commit log is very friendly for
> > users and followers to know what has been fixed or improved.
> >    But for CALCITE-4983, I think  `Incorrect plan for query that has
> > GROUPING SETS and WHERE` is not enough, Because only MV rewriting have
> > wrong according to the description, not all queries. So I suggest `MV
> > rewriting generates the incorrect plan for query that has GROUPING SETS
> and
> > WHERE`
> >   In addition, I admit this forced me to change my natural way of
> thinking
> > and not every issue I can describe correctly. But I think this is good
> for
> > me.
> >
> > Viliam Durina <vi...@hazelcast.com> 于2022年1月13日周四 16:11写道:
> >
> > > Hi all,
> > >
> > > I also prefer root cause over end effect, if it's known. The end effect
> > > should definitely be mentioned in the body. I even tend to edit
> summaries
> > > when I start with the end effect and find out the root cause later.
> > >
> > > I also prefer a more impersonal tone. In our example I prefer this:
> > >
> > > > SubstitutionVisitor.unifyAggregates() doesn't handle conditions if
> > > Aggregate
> > > > has grouping sets
> > >
> > > Viliam
> > >
> > > On Wed, 12 Jan 2022 at 20:38, Vladimir Ozerov <pp...@gmail.com>
> > wrote:
> > >
> > > > Hi Julian,
> > > >
> > > > In my opinion, both ways work well. People tend to think differently.
> > > Some
> > > > prefer symptoms, others - the root cause. I personally prefer the
> > latter
> > > > for the following reason. If I face a problem, I first try to debug
> it
> > on
> > > > my own. The result of the analysis is usually some questionable
> > behavior
> > > in
> > > > a specific part of the code. Once you find the problematic place, you
> > can
> > > > run a search in JIRA or Git log (class name, feature name, etc) and
> > check
> > > > whether somebody else faced a similar issue. The description
> "Incorrect
> > > > plan ..." is less likely to help me than more concrete "In
> > > > SubstitutionVisitor ...". Especially, given that a single root cause
> > may
> > > > manifest in several ways. But I would like to stress out - it is a
> > matter
> > > > of personal habits and previous experience, not something that I
> > > > expect others to follow.
> > > >
> > > > In the past, I worked on the Apache Ignite project. We had a number
> of
> > > > contribution rules, such as "put a comma here", "set the proper
> > component
> > > > there", "write the comment in that way", etc. I was the one who
> > actively
> > > > enforced this for a may years, because it gave the feeling that
> > > everything
> > > > is "put in order". Eventually, I came to the conclusion that this
> does
> > > more
> > > > harm than good, because I regularly observed confusion and
> > > dissatisfaction
> > > > of the new contributors (and Apache Ignite community is far less
> > diverse
> > > > and active than in Apache Calcite), as they were forced to change
> their
> > > > natural way of thinking or past habits to engage with the community.
> > > >
> > > > Regards,
> > > > Vladimir.
> > > >
> > > > ср, 12 янв. 2022 г. в 21:42, Julian Hyde <jh...@gmail.com>:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Can we discuss how we write summaries for Jira cases? In my opinion
> > > it’s
> > > > > really important, because summaries become commit messages, and
> > commit
> > > > > messages become release notes, which is how most people figure out
> > what
> > > > is
> > > > > in Calcite. I spend a lot of my time working with people to write
> > good
> > > > > summaries.
> > > > >
> > > > > I’d like some feedback on whether this approach is useful. And to
> try
> > > to
> > > > > teach people how to do it for themselves.
> > > > >
> > > > > Consider this case
> > https://issues.apache.org/jira/browse/CALCITE-4983
> > > <
> > > > > https://issues.apache.org/jira/browse/CALCITE-4983>. (I chose a
> > still
> > > > > current case because it doesn’t yet have an ‘answer’.)
> > > > >
> > > > > The current summary is
> > > > >
> > > > > >  In SubstitutionVisitor's unifyAggregates, if Aggregate has
> > > > > >  grouping sets, we need to handle the condition needs to pull up.
> > > > >
> > > > > It describes the cause but it doesn’t describe the problem (or the
> > > > > symptoms the user sees).
> > > > >
> > > > > If you take your car into your mechanic the cause is ‘Leaky gasket
> > > > results
> > > > > in oil dripping onto hot manifold’ but the problem is ‘Smoke comes
> > from
> > > > > hood when engine gets hot’. Do you agree that the second
> description
> > is
> > > > > much more useful?
> > > > >
> > > > > In this case, the author came up with an example:
> > > > >
> > > > > > Here is an example:
> > > > > >
> > > > > > sql: select empid, deptno from emps group by grouping sets
> ((empid,
> > > > > deptno),(empid))
> > > > > > mv: select empid, count(distinct deptno) from emps where
> empid>100
> > > > > >   group by grouping sets ((empid, deptno), (empid))
> > > > > >
> > > > > > the result plan is:
> > > > > >
> > > > > >   LogicalCalc(expr#0..2=[{inputs}], deptno=[$t1], EXPR$1=[$t2])
> > > > > >     LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]],
> > > > > EXPR$1=[COUNT(DISTINCT $1)])
> > > > > >       EnumerableTableScan(table=[[hr, MV0]])
> > > > > >
> > > > > > We can see that this plan doesn't handle the condition empid>100
> > > > >
> > > > > I think it’s a great example. I especially like the last line,
> where
> > > the
> > > > > author pointed out what was wrong. I suggest the following summary:
> > > > >
> > > > > > Incorrect plan for query that has GROUPING SETS and WHERE
> > > > >
> > > > > Do you think the summary is more useful? Can it be improved?
> > > > >
> > > > > Julian
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > > --
> > > This message contains confidential information and is intended only for
> > > the
> > > individuals named. If you are not the named addressee you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately by e-mail if you have received this e-mail by mistake and
> > > delete this e-mail from your system. E-mail transmission cannot be
> > > guaranteed to be secure or error-free as information could be
> > intercepted,
> > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > viruses.
> > > The sender therefore does not accept liability for any errors or
> > omissions
> > > in the contents of this message, which arise as a result of e-mail
> > > transmission. If verification is required, please request a hard-copy
> > > version. -Hazelcast
> > >
> >
>

Re: [DISCUSS] Writing good summaries for Jira cases

Posted by Yanjing Wang <zh...@gmail.com>.
I agree with Vladimir Ozerov, When I was a newcomer to Calcite, I just used
its core part rather than framework, so I don't know some concepts. I don't
know what JDBC adapter really is,  such as CALCITE-4901, CALCITE-4740 I
reported before. I don't know why the title needs to add 'JDBC adapter '.
Another example is CALCITE-4683, I don't know how I could describe the
symptom, because it's just an online bad case, I can't summarize it well
before I found a more simple and representative case. Frankly, as newcomers
will be more, I think the community needs patience and a friendly guide to
make the summaries friendly to the most contributors. I appreciate Julian
Hyde's work very much, He found the summary problem and tried to resolve it
which gave me lots of trouble for a long time.

xiong duan <no...@gmail.com> 于2022年1月13日周四 19:56写道:

> Hi all,
>   I prefer to describe the symptoms the user sees. Because For developers
> and users both can know what happens when Calcite has a wrong result or
> unexpected exception. If the user encounters a problem, I guess he wants to
> find the related symptoms summary in the git log, not the root cause. The
> root cause requires the user have to do some debug or know related
> implement in Calcite. I think this is not very friendly to describe
> the root cause. When we release a new Calcite version with use the
> description of the symptoms, I think the commit log is very friendly for
> users and followers to know what has been fixed or improved.
>    But for CALCITE-4983, I think  `Incorrect plan for query that has
> GROUPING SETS and WHERE` is not enough, Because only MV rewriting have
> wrong according to the description, not all queries. So I suggest `MV
> rewriting generates the incorrect plan for query that has GROUPING SETS and
> WHERE`
>   In addition, I admit this forced me to change my natural way of thinking
> and not every issue I can describe correctly. But I think this is good for
> me.
>
> Viliam Durina <vi...@hazelcast.com> 于2022年1月13日周四 16:11写道:
>
> > Hi all,
> >
> > I also prefer root cause over end effect, if it's known. The end effect
> > should definitely be mentioned in the body. I even tend to edit summaries
> > when I start with the end effect and find out the root cause later.
> >
> > I also prefer a more impersonal tone. In our example I prefer this:
> >
> > > SubstitutionVisitor.unifyAggregates() doesn't handle conditions if
> > Aggregate
> > > has grouping sets
> >
> > Viliam
> >
> > On Wed, 12 Jan 2022 at 20:38, Vladimir Ozerov <pp...@gmail.com>
> wrote:
> >
> > > Hi Julian,
> > >
> > > In my opinion, both ways work well. People tend to think differently.
> > Some
> > > prefer symptoms, others - the root cause. I personally prefer the
> latter
> > > for the following reason. If I face a problem, I first try to debug it
> on
> > > my own. The result of the analysis is usually some questionable
> behavior
> > in
> > > a specific part of the code. Once you find the problematic place, you
> can
> > > run a search in JIRA or Git log (class name, feature name, etc) and
> check
> > > whether somebody else faced a similar issue. The description "Incorrect
> > > plan ..." is less likely to help me than more concrete "In
> > > SubstitutionVisitor ...". Especially, given that a single root cause
> may
> > > manifest in several ways. But I would like to stress out - it is a
> matter
> > > of personal habits and previous experience, not something that I
> > > expect others to follow.
> > >
> > > In the past, I worked on the Apache Ignite project. We had a number of
> > > contribution rules, such as "put a comma here", "set the proper
> component
> > > there", "write the comment in that way", etc. I was the one who
> actively
> > > enforced this for a may years, because it gave the feeling that
> > everything
> > > is "put in order". Eventually, I came to the conclusion that this does
> > more
> > > harm than good, because I regularly observed confusion and
> > dissatisfaction
> > > of the new contributors (and Apache Ignite community is far less
> diverse
> > > and active than in Apache Calcite), as they were forced to change their
> > > natural way of thinking or past habits to engage with the community.
> > >
> > > Regards,
> > > Vladimir.
> > >
> > > ср, 12 янв. 2022 г. в 21:42, Julian Hyde <jh...@gmail.com>:
> > >
> > > > Hi all,
> > > >
> > > > Can we discuss how we write summaries for Jira cases? In my opinion
> > it’s
> > > > really important, because summaries become commit messages, and
> commit
> > > > messages become release notes, which is how most people figure out
> what
> > > is
> > > > in Calcite. I spend a lot of my time working with people to write
> good
> > > > summaries.
> > > >
> > > > I’d like some feedback on whether this approach is useful. And to try
> > to
> > > > teach people how to do it for themselves.
> > > >
> > > > Consider this case
> https://issues.apache.org/jira/browse/CALCITE-4983
> > <
> > > > https://issues.apache.org/jira/browse/CALCITE-4983>. (I chose a
> still
> > > > current case because it doesn’t yet have an ‘answer’.)
> > > >
> > > > The current summary is
> > > >
> > > > >  In SubstitutionVisitor's unifyAggregates, if Aggregate has
> > > > >  grouping sets, we need to handle the condition needs to pull up.
> > > >
> > > > It describes the cause but it doesn’t describe the problem (or the
> > > > symptoms the user sees).
> > > >
> > > > If you take your car into your mechanic the cause is ‘Leaky gasket
> > > results
> > > > in oil dripping onto hot manifold’ but the problem is ‘Smoke comes
> from
> > > > hood when engine gets hot’. Do you agree that the second description
> is
> > > > much more useful?
> > > >
> > > > In this case, the author came up with an example:
> > > >
> > > > > Here is an example:
> > > > >
> > > > > sql: select empid, deptno from emps group by grouping sets ((empid,
> > > > deptno),(empid))
> > > > > mv: select empid, count(distinct deptno) from emps where empid>100
> > > > >   group by grouping sets ((empid, deptno), (empid))
> > > > >
> > > > > the result plan is:
> > > > >
> > > > >   LogicalCalc(expr#0..2=[{inputs}], deptno=[$t1], EXPR$1=[$t2])
> > > > >     LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]],
> > > > EXPR$1=[COUNT(DISTINCT $1)])
> > > > >       EnumerableTableScan(table=[[hr, MV0]])
> > > > >
> > > > > We can see that this plan doesn't handle the condition empid>100
> > > >
> > > > I think it’s a great example. I especially like the last line, where
> > the
> > > > author pointed out what was wrong. I suggest the following summary:
> > > >
> > > > > Incorrect plan for query that has GROUPING SETS and WHERE
> > > >
> > > > Do you think the summary is more useful? Can it be improved?
> > > >
> > > > Julian
> > > >
> > > >
> > > >
> > > >
> > >
> >
> > --
> > This message contains confidential information and is intended only for
> > the
> > individuals named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be
> intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain
> viruses.
> > The sender therefore does not accept liability for any errors or
> omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required, please request a hard-copy
> > version. -Hazelcast
> >
>

Re: [DISCUSS] Writing good summaries for Jira cases

Posted by xiong duan <no...@gmail.com>.
Hi all,
  I prefer to describe the symptoms the user sees. Because For developers
and users both can know what happens when Calcite has a wrong result or
unexpected exception. If the user encounters a problem, I guess he wants to
find the related symptoms summary in the git log, not the root cause. The
root cause requires the user have to do some debug or know related
implement in Calcite. I think this is not very friendly to describe
the root cause. When we release a new Calcite version with use the
description of the symptoms, I think the commit log is very friendly for
users and followers to know what has been fixed or improved.
   But for CALCITE-4983, I think  `Incorrect plan for query that has
GROUPING SETS and WHERE` is not enough, Because only MV rewriting have
wrong according to the description, not all queries. So I suggest `MV
rewriting generates the incorrect plan for query that has GROUPING SETS and
WHERE`
  In addition, I admit this forced me to change my natural way of thinking
and not every issue I can describe correctly. But I think this is good for
me.

Viliam Durina <vi...@hazelcast.com> 于2022年1月13日周四 16:11写道:

> Hi all,
>
> I also prefer root cause over end effect, if it's known. The end effect
> should definitely be mentioned in the body. I even tend to edit summaries
> when I start with the end effect and find out the root cause later.
>
> I also prefer a more impersonal tone. In our example I prefer this:
>
> > SubstitutionVisitor.unifyAggregates() doesn't handle conditions if
> Aggregate
> > has grouping sets
>
> Viliam
>
> On Wed, 12 Jan 2022 at 20:38, Vladimir Ozerov <pp...@gmail.com> wrote:
>
> > Hi Julian,
> >
> > In my opinion, both ways work well. People tend to think differently.
> Some
> > prefer symptoms, others - the root cause. I personally prefer the latter
> > for the following reason. If I face a problem, I first try to debug it on
> > my own. The result of the analysis is usually some questionable behavior
> in
> > a specific part of the code. Once you find the problematic place, you can
> > run a search in JIRA or Git log (class name, feature name, etc) and check
> > whether somebody else faced a similar issue. The description "Incorrect
> > plan ..." is less likely to help me than more concrete "In
> > SubstitutionVisitor ...". Especially, given that a single root cause may
> > manifest in several ways. But I would like to stress out - it is a matter
> > of personal habits and previous experience, not something that I
> > expect others to follow.
> >
> > In the past, I worked on the Apache Ignite project. We had a number of
> > contribution rules, such as "put a comma here", "set the proper component
> > there", "write the comment in that way", etc. I was the one who actively
> > enforced this for a may years, because it gave the feeling that
> everything
> > is "put in order". Eventually, I came to the conclusion that this does
> more
> > harm than good, because I regularly observed confusion and
> dissatisfaction
> > of the new contributors (and Apache Ignite community is far less diverse
> > and active than in Apache Calcite), as they were forced to change their
> > natural way of thinking or past habits to engage with the community.
> >
> > Regards,
> > Vladimir.
> >
> > ср, 12 янв. 2022 г. в 21:42, Julian Hyde <jh...@gmail.com>:
> >
> > > Hi all,
> > >
> > > Can we discuss how we write summaries for Jira cases? In my opinion
> it’s
> > > really important, because summaries become commit messages, and commit
> > > messages become release notes, which is how most people figure out what
> > is
> > > in Calcite. I spend a lot of my time working with people to write good
> > > summaries.
> > >
> > > I’d like some feedback on whether this approach is useful. And to try
> to
> > > teach people how to do it for themselves.
> > >
> > > Consider this case https://issues.apache.org/jira/browse/CALCITE-4983
> <
> > > https://issues.apache.org/jira/browse/CALCITE-4983>. (I chose a still
> > > current case because it doesn’t yet have an ‘answer’.)
> > >
> > > The current summary is
> > >
> > > >  In SubstitutionVisitor's unifyAggregates, if Aggregate has
> > > >  grouping sets, we need to handle the condition needs to pull up.
> > >
> > > It describes the cause but it doesn’t describe the problem (or the
> > > symptoms the user sees).
> > >
> > > If you take your car into your mechanic the cause is ‘Leaky gasket
> > results
> > > in oil dripping onto hot manifold’ but the problem is ‘Smoke comes from
> > > hood when engine gets hot’. Do you agree that the second description is
> > > much more useful?
> > >
> > > In this case, the author came up with an example:
> > >
> > > > Here is an example:
> > > >
> > > > sql: select empid, deptno from emps group by grouping sets ((empid,
> > > deptno),(empid))
> > > > mv: select empid, count(distinct deptno) from emps where empid>100
> > > >   group by grouping sets ((empid, deptno), (empid))
> > > >
> > > > the result plan is:
> > > >
> > > >   LogicalCalc(expr#0..2=[{inputs}], deptno=[$t1], EXPR$1=[$t2])
> > > >     LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]],
> > > EXPR$1=[COUNT(DISTINCT $1)])
> > > >       EnumerableTableScan(table=[[hr, MV0]])
> > > >
> > > > We can see that this plan doesn't handle the condition empid>100
> > >
> > > I think it’s a great example. I especially like the last line, where
> the
> > > author pointed out what was wrong. I suggest the following summary:
> > >
> > > > Incorrect plan for query that has GROUPING SETS and WHERE
> > >
> > > Do you think the summary is more useful? Can it be improved?
> > >
> > > Julian
> > >
> > >
> > >
> > >
> >
>
> --
> This message contains confidential information and is intended only for
> the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast
>

Re: [DISCUSS] Writing good summaries for Jira cases

Posted by Viliam Durina <vi...@hazelcast.com>.
Hi all,

I also prefer root cause over end effect, if it's known. The end effect
should definitely be mentioned in the body. I even tend to edit summaries
when I start with the end effect and find out the root cause later.

I also prefer a more impersonal tone. In our example I prefer this:

> SubstitutionVisitor.unifyAggregates() doesn't handle conditions if
Aggregate
> has grouping sets

Viliam

On Wed, 12 Jan 2022 at 20:38, Vladimir Ozerov <pp...@gmail.com> wrote:

> Hi Julian,
>
> In my opinion, both ways work well. People tend to think differently. Some
> prefer symptoms, others - the root cause. I personally prefer the latter
> for the following reason. If I face a problem, I first try to debug it on
> my own. The result of the analysis is usually some questionable behavior in
> a specific part of the code. Once you find the problematic place, you can
> run a search in JIRA or Git log (class name, feature name, etc) and check
> whether somebody else faced a similar issue. The description "Incorrect
> plan ..." is less likely to help me than more concrete "In
> SubstitutionVisitor ...". Especially, given that a single root cause may
> manifest in several ways. But I would like to stress out - it is a matter
> of personal habits and previous experience, not something that I
> expect others to follow.
>
> In the past, I worked on the Apache Ignite project. We had a number of
> contribution rules, such as "put a comma here", "set the proper component
> there", "write the comment in that way", etc. I was the one who actively
> enforced this for a may years, because it gave the feeling that everything
> is "put in order". Eventually, I came to the conclusion that this does more
> harm than good, because I regularly observed confusion and dissatisfaction
> of the new contributors (and Apache Ignite community is far less diverse
> and active than in Apache Calcite), as they were forced to change their
> natural way of thinking or past habits to engage with the community.
>
> Regards,
> Vladimir.
>
> ср, 12 янв. 2022 г. в 21:42, Julian Hyde <jh...@gmail.com>:
>
> > Hi all,
> >
> > Can we discuss how we write summaries for Jira cases? In my opinion it’s
> > really important, because summaries become commit messages, and commit
> > messages become release notes, which is how most people figure out what
> is
> > in Calcite. I spend a lot of my time working with people to write good
> > summaries.
> >
> > I’d like some feedback on whether this approach is useful. And to try to
> > teach people how to do it for themselves.
> >
> > Consider this case https://issues.apache.org/jira/browse/CALCITE-4983 <
> > https://issues.apache.org/jira/browse/CALCITE-4983>. (I chose a still
> > current case because it doesn’t yet have an ‘answer’.)
> >
> > The current summary is
> >
> > >  In SubstitutionVisitor's unifyAggregates, if Aggregate has
> > >  grouping sets, we need to handle the condition needs to pull up.
> >
> > It describes the cause but it doesn’t describe the problem (or the
> > symptoms the user sees).
> >
> > If you take your car into your mechanic the cause is ‘Leaky gasket
> results
> > in oil dripping onto hot manifold’ but the problem is ‘Smoke comes from
> > hood when engine gets hot’. Do you agree that the second description is
> > much more useful?
> >
> > In this case, the author came up with an example:
> >
> > > Here is an example:
> > >
> > > sql: select empid, deptno from emps group by grouping sets ((empid,
> > deptno),(empid))
> > > mv: select empid, count(distinct deptno) from emps where empid>100
> > >   group by grouping sets ((empid, deptno), (empid))
> > >
> > > the result plan is:
> > >
> > >   LogicalCalc(expr#0..2=[{inputs}], deptno=[$t1], EXPR$1=[$t2])
> > >     LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]],
> > EXPR$1=[COUNT(DISTINCT $1)])
> > >       EnumerableTableScan(table=[[hr, MV0]])
> > >
> > > We can see that this plan doesn't handle the condition empid>100
> >
> > I think it’s a great example. I especially like the last line, where the
> > author pointed out what was wrong. I suggest the following summary:
> >
> > > Incorrect plan for query that has GROUPING SETS and WHERE
> >
> > Do you think the summary is more useful? Can it be improved?
> >
> > Julian
> >
> >
> >
> >
>

-- 
This message contains confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required, please request a hard-copy 
version. -Hazelcast

Re: [DISCUSS] Writing good summaries for Jira cases

Posted by Vladimir Ozerov <pp...@gmail.com>.
Hi Julian,

In my opinion, both ways work well. People tend to think differently. Some
prefer symptoms, others - the root cause. I personally prefer the latter
for the following reason. If I face a problem, I first try to debug it on
my own. The result of the analysis is usually some questionable behavior in
a specific part of the code. Once you find the problematic place, you can
run a search in JIRA or Git log (class name, feature name, etc) and check
whether somebody else faced a similar issue. The description "Incorrect
plan ..." is less likely to help me than more concrete "In
SubstitutionVisitor ...". Especially, given that a single root cause may
manifest in several ways. But I would like to stress out - it is a matter
of personal habits and previous experience, not something that I
expect others to follow.

In the past, I worked on the Apache Ignite project. We had a number of
contribution rules, such as "put a comma here", "set the proper component
there", "write the comment in that way", etc. I was the one who actively
enforced this for a may years, because it gave the feeling that everything
is "put in order". Eventually, I came to the conclusion that this does more
harm than good, because I regularly observed confusion and dissatisfaction
of the new contributors (and Apache Ignite community is far less diverse
and active than in Apache Calcite), as they were forced to change their
natural way of thinking or past habits to engage with the community.

Regards,
Vladimir.

ср, 12 янв. 2022 г. в 21:42, Julian Hyde <jh...@gmail.com>:

> Hi all,
>
> Can we discuss how we write summaries for Jira cases? In my opinion it’s
> really important, because summaries become commit messages, and commit
> messages become release notes, which is how most people figure out what is
> in Calcite. I spend a lot of my time working with people to write good
> summaries.
>
> I’d like some feedback on whether this approach is useful. And to try to
> teach people how to do it for themselves.
>
> Consider this case https://issues.apache.org/jira/browse/CALCITE-4983 <
> https://issues.apache.org/jira/browse/CALCITE-4983>. (I chose a still
> current case because it doesn’t yet have an ‘answer’.)
>
> The current summary is
>
> >  In SubstitutionVisitor's unifyAggregates, if Aggregate has
> >  grouping sets, we need to handle the condition needs to pull up.
>
> It describes the cause but it doesn’t describe the problem (or the
> symptoms the user sees).
>
> If you take your car into your mechanic the cause is ‘Leaky gasket results
> in oil dripping onto hot manifold’ but the problem is ‘Smoke comes from
> hood when engine gets hot’. Do you agree that the second description is
> much more useful?
>
> In this case, the author came up with an example:
>
> > Here is an example:
> >
> > sql: select empid, deptno from emps group by grouping sets ((empid,
> deptno),(empid))
> > mv: select empid, count(distinct deptno) from emps where empid>100
> >   group by grouping sets ((empid, deptno), (empid))
> >
> > the result plan is:
> >
> >   LogicalCalc(expr#0..2=[{inputs}], deptno=[$t1], EXPR$1=[$t2])
> >     LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}]],
> EXPR$1=[COUNT(DISTINCT $1)])
> >       EnumerableTableScan(table=[[hr, MV0]])
> >
> > We can see that this plan doesn't handle the condition empid>100
>
> I think it’s a great example. I especially like the last line, where the
> author pointed out what was wrong. I suggest the following summary:
>
> > Incorrect plan for query that has GROUPING SETS and WHERE
>
> Do you think the summary is more useful? Can it be improved?
>
> Julian
>
>
>
>