You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2018/10/15 18:10:19 UTC

Exception-handling in built-in functions

There’s a discussion on https://issues.apache.org/jira/browse/CALCITE-525, "Exception-handling in built-in functions” that seems to be heading for stalemate. There is a PR that I am inclined to accept (with some modifications, probably), and based on his comments so far, Vladimir seems to be set against.

We had our first commit veto from Vladimir a couple of months ago, and I don’t want to have another.

How can we work through this and get to consensus?

Julian

Re: Re: Exception-handling in built-in functions

Posted by "hongzezhang (张宏泽)" <ho...@tencent.com>.

Hi,

Thanks for opening this discussion and thank you all for your replies.

The reason why I would like to help solve CALCITE-525 is because I agree that an function error should not easily break a large job, this "job" could indicate a large ad-hoc query or a query on a stream, or like what Julian said, a ETL job.
According to some tests I had run about Calcite streaming, a simple division by zero error could break the whole query job. Say, If Calcite does not provide a option to silence the error, users should spend extract daily time on checking whether their streaming job is alive if their SQL includes a "/" operator.

The solutions of B and C are different: B is to ignore and drop a whole row when error happens, and C is to make the failed call return null value. In C, the major reason to introduce "CATCH_ERROR" function is to represent the error handling on Rel, whether to enable the function in Parser is not important.
At first I personally prefer C, but as C does need more code change than B, I don't have a very strong inclination between B and C now.

At last, I agree that making enough discussion on introducing new feature is very important. If we don't really need such a feature now, I think I can help whenever it is considered helpful in future.

Thanks,
Hongze
________________________________
notifyzhz@126.com

From: Jesus Camacho Rodriguez<ma...@hortonworks.com>
Date: 2018-10-18 10:26
To: dev@calcite.apache.org<ma...@calcite.apache.org>
Subject: Re: Exception-handling in built-in functions
I do not believe there is enough reason to block CALCITE-525. IMO, CALCITE-525 describes a problem that some Calcite users are facing and a reasonable plugable solution. We should not be vetoing such a feature without providing viable alternatives. (Without having checked the specific implementation details, I prefer approach B described below as it is less intrusive. And A should be fixed in a different issue.)
I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or Postgres, and it has always tried to provide flexibility to underlying engines, one of the reasons for its wide adoption. In addition, systems are not forced to use this feature, it is tagged as experimental and by default we are still running in same mode. I believe that is sufficient.
Personally, I will not be happy if a developer feels compelled to fork Calcite or stop contributing code because we do not accept features such as the one described there.

Thanks,
Jesús

On 10/17/18, 5:17 PM, "Michael Mior" <mm...@apache.org> wrote:

    My apologies for missing this thread a couple days ago. (Thanks for pinging
    it.) Here's my two cents: taking care of contributors to the project is
    just as important (if not more important) than taking care of the code. I'm
    not saying we should merge terrible code just to keep each other happy, but
    I don't think that's the case here. If anyone writes some code which you
    disagree with, you should be free to voice your disagreement. However,
    especially when the code is from a core contributor and the argument
    focuses on potential future problems, I think it's important to consider
    that people who have shown dedication to the project over the years are
    very likely to be around and willing to fix these problems as they arise.

    Code which turns out to cause problems can always be deleted, reverted,
    refactored, etc. It's much harder to back out when a contributor is burned
    out or interpersonal conflicts get heated.

    --
    Michael Mior
    mmior@apache.org

    Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jh...@apache.org> a écrit :

    > Vladimir,
    >
    > You’ve made your points. And I hear them.
    >
    > However I get the impression that you are not open to persuasion. Which
    > means that I am wasting my time trying to reach consensus with you. Which
    > means that people win arguments not on merit, but based upon who is most
    > persistent.
    >
    > Here is my point. Calcite's goal is not to re-create what Oracle or
    > PostgreSQL did ten years later. It is a platform that allows people to
    > write their own data engine. If they want to redefine the “+” operator such
    > that 2 + 2 returns 5, the platform should allow it.
    >
    > Certainly if they want to engineer their own error-handling strategy, we
    > should let them do it. I didn’t have the energy to find an example of a SQL
    > engine that discards rows with divide-by-zero errors, but I believe there
    > is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
    > engines that I have worked on that performed ETL-like tasks, all had that
    > capability. And all ETL tools have very flexible error-handling strategies.
    > They are not SQL-based, but Calcite is not exclusively for SQL systems.
    >
    > I have been designing and building world-class data engines for 30 years.
    > Please take me on good faith that a flexible error-handing strategy is a
    > good idea. Don’t force me to bicker over email for hours and hours. When a
    > long discussion leads to the rejection of a contribution, I get
    > considerably closer to burning out.
    >
    > Julian
    >
    >
    > > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
    > sitnikov.vladimir@gmail.com> wrote:
    > >
    > > Juilian>Hey, folks. We need your input here.
    > >
    > > Here are my thoughts:
    > > 1) I think the features we add should have at least some level of
    > > consistency
    > > 2) It is much safer to adopt well-known features rather than be pioneers
    > in
    > > the field. I do not mean we must wait for someone else implement and try
    > > out a feature, however I would not rush for implementing a feature that
    > > no-one else explored.
    > >
    > > CALCITE-525 has two key points:
    > > A) Current implementation of enumerable factors code like 0/0 to a static
    > > field of a generated code. It causes the generated code to fail at load
    > > time even before the query is executed.
    > > Of course that is a bug, and I'm even inclined to remove that "static
    > > fields"
    > >
    > > B) Someone (Hongze? Juilan?) suggest to implement a mode to silently
    > ignore
    > > the error (e.g. by ignoring the row or by returning default value).
    > > First of all, I don't think "ignore the row" kind of processing would do
    > > any good to the user since it would not be possible to predict the
    > output.
    > > "ignore the row" is very tricky when join/semijoin/antijoin is there.
    > >
    > > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think
    > we
    > > should not rush for it.
    > >
    > > C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
    > > 0)  EMPTY ON ERROR  kind of functions.
    > > That enables to confine the scope of the error, however I don't think it
    > > would be used often (does that mean one would have to wrap each
    > > expression?), and this "catch error" would be non-trivial to propagate to
    > > the downstream executors.
    > > On top of that, we might end up inventing full-blown
    > try-catch-catch-catch
    > > syntax.
    > >
    > > I truly see no business value in implementing B/C, however I do see the
    > > pain it would introduce. It would complicate Calcite maintenance. "B"
    > could
    > > silently produce wrong results, and I'm sure we don't want get results
    > out
    > > of thin air.
    > >
    > > Vladimir
    >
    >

Re: Re: Exception-handling in built-in functions

Posted by Stamatis Zampetakis <za...@gmail.com>.

Hello,

First of all, thanks Hongze for working on this!

CALCITE-525 asks for throwing exceptions in certain circumstances and I
think the fix should focus exclusively on this.

Regarding the discussion about additional actions in case of an error
(omitting rows or logging) I believe it is useful but I think it could be
achieved by other means such as overloading the operators (using
SqlOperatorTable). Omitting rows or logging is not a very common policy to
become part of the core project. I think that projects relying on Calcite
and need this kind of functionality could provide their own operator
overloads relatively easy.

Providing additional built in functions, such as CATCH_ERROR, might become
handy in some situations but again it is something that individual projects
could easily plugin as a user defined function.

Note that, I am not against the existing PR but I would like to highlight
an alternative option.

Best,
Stamatis

Στις Πέμ, 18 Οκτ 2018 στις 1:36 μ.μ., ο/η notifyzhz@126.com <
notifyzhz@126.com> έγραψε:

> Hi,
>
> Thanks for opening this discussion and thank you all for your replies.
>
> The reason why I would like to help solve CALCITE-525 is because I agree
> that an function error should not easily break a large job, this "job"
> could indicate a large ad-hoc query or a query on a stream, or like what
> Julian said, a ETL job.
> According to some tests I had run about Calcite streaming, a simple
> division by zero error could break the whole query job. Say, If Calcite
> does not provide a option to silence the error, users should spend extract
> daily time on checking whether their streaming job is alive if their SQL
> includes a "/" operator.
>
> The solutions of B and C are different: B is to ignore and drop a whole
> row when error happens, and C is to make the failed call return null value.
> In C, the major reason to introduce "CATCH_ERROR" function is to represent
> the error handling on Rel, whether to enable the function in Parser is not
> important.
> At first I personally prefer C, but as C does need more code change than
> B, I don't have a very strong inclination between B and C now.
>
> At last, I agree that making enough discussion on introducing new feature
> is very important. If we don't really need such a feature now, I think I
> can help whenever it is considered helpful in future.
>
> Thanks,
> Hongze
>
> From: Jesus Camacho Rodriguez
> Date: 2018-10-18 10:26
> To: dev@calcite.apache.org
> Subject: Re: Exception-handling in built-in functions
> I do not believe there is enough reason to block CALCITE-525. IMO,
> CALCITE-525 describes a problem that some Calcite users are facing and a
> reasonable plugable solution. We should not be vetoing such a feature
> without providing viable alternatives. (Without having checked the specific
> implementation details, I prefer approach B described below as it is less
> intrusive. And A should be fixed in a different issue.)
> I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or
> Postgres, and it has always tried to provide flexibility to underlying
> engines, one of the reasons for its wide adoption. In addition, systems are
> not forced to use this feature, it is tagged as experimental and by default
> we are still running in same mode. I believe that is sufficient.
> Personally, I will not be happy if a developer feels compelled to fork
> Calcite or stop contributing code because we do not accept features such as
> the one described there.
>
> Thanks,
> Jesús
>
>
> On 10/17/18, 5:17 PM, "Michael Mior" <mm...@apache.org> wrote:
>
>     My apologies for missing this thread a couple days ago. (Thanks for
> pinging
>     it.) Here's my two cents: taking care of contributors to the project is
>     just as important (if not more important) than taking care of the
> code. I'm
>     not saying we should merge terrible code just to keep each other
> happy, but
>     I don't think that's the case here. If anyone writes some code which
> you
>     disagree with, you should be free to voice your disagreement. However,
>     especially when the code is from a core contributor and the argument
>     focuses on potential future problems, I think it's important to
> consider
>     that people who have shown dedication to the project over the years are
>     very likely to be around and willing to fix these problems as they
> arise.
>
>     Code which turns out to cause problems can always be deleted, reverted,
>     refactored, etc. It's much harder to back out when a contributor is
> burned
>     out or interpersonal conflicts get heated.
>
>     --
>     Michael Mior
>     mmior@apache.org
>
>
>     Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jh...@apache.org> a écrit :
>
>     > Vladimir,
>     >
>     > You’ve made your points. And I hear them.
>     >
>     > However I get the impression that you are not open to persuasion.
> Which
>     > means that I am wasting my time trying to reach consensus with you.
> Which
>     > means that people win arguments not on merit, but based upon who is
> most
>     > persistent.
>     >
>     > Here is my point. Calcite's goal is not to re-create what Oracle or
>     > PostgreSQL did ten years later. It is a platform that allows people
> to
>     > write their own data engine. If they want to redefine the “+”
> operator such
>     > that 2 + 2 returns 5, the platform should allow it.
>     >
>     > Certainly if they want to engineer their own error-handling
> strategy, we
>     > should let them do it. I didn’t have the energy to find an example
> of a SQL
>     > engine that discards rows with divide-by-zero errors, but I believe
> there
>     > is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
>     > engines that I have worked on that performed ETL-like tasks, all had
> that
>     > capability. And all ETL tools have very flexible error-handling
> strategies.
>     > They are not SQL-based, but Calcite is not exclusively for SQL
> systems.
>     >
>     > I have been designing and building world-class data engines for 30
> years.
>     > Please take me on good faith that a flexible error-handing strategy
> is a
>     > good idea. Don’t force me to bicker over email for hours and hours.
> When a
>     > long discussion leads to the rejection of a contribution, I get
>     > considerably closer to burning out.
>     >
>     > Julian
>     >
>     >
>     > > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
>     > sitnikov.vladimir@gmail.com> wrote:
>     > >
>     > > Juilian>Hey, folks. We need your input here.
>     > >
>     > > Here are my thoughts:
>     > > 1) I think the features we add should have at least some level of
>     > > consistency
>     > > 2) It is much safer to adopt well-known features rather than be
> pioneers
>     > in
>     > > the field. I do not mean we must wait for someone else implement
> and try
>     > > out a feature, however I would not rush for implementing a feature
> that
>     > > no-one else explored.
>     > >
>     > > CALCITE-525 has two key points:
>     > > A) Current implementation of enumerable factors code like 0/0 to a
> static
>     > > field of a generated code. It causes the generated code to fail at
> load
>     > > time even before the query is executed.
>     > > Of course that is a bug, and I'm even inclined to remove that
> "static
>     > > fields"
>     > >
>     > > B) Someone (Hongze? Juilan?) suggest to implement a mode to
> silently
>     > ignore
>     > > the error (e.g. by ignoring the row or by returning default value).
>     > > First of all, I don't think "ignore the row" kind of processing
> would do
>     > > any good to the user since it would not be possible to predict the
>     > output.
>     > > "ignore the row" is very tricky when join/semijoin/antijoin is
> there.
>     > >
>     > > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I
> think
>     > we
>     > > should not rush for it.
>     > >
>     > > C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or
> CATCH_ERROR(1 /
>     > > 0)  EMPTY ON ERROR  kind of functions.
>     > > That enables to confine the scope of the error, however I don't
> think it
>     > > would be used often (does that mean one would have to wrap each
>     > > expression?), and this "catch error" would be non-trivial to
> propagate to
>     > > the downstream executors.
>     > > On top of that, we might end up inventing full-blown
>     > try-catch-catch-catch
>     > > syntax.
>     > >
>     > > I truly see no business value in implementing B/C, however I do
> see the
>     > > pain it would introduce. It would complicate Calcite maintenance.
> "B"
>     > could
>     > > silently produce wrong results, and I'm sure we don't want get
> results
>     > out
>     > > of thin air.
>     > >
>     > > Vladimir
>     >
>     >
>
>
>

Re: Re: Exception-handling in built-in functions

Posted by "notifyzhz@126.com" <no...@126.com>.

Hi,

Thanks for opening this discussion and thank you all for your replies.

The reason why I would like to help solve CALCITE-525 is because I agree that an function error should not easily break a large job, this "job" could indicate a large ad-hoc query or a query on a stream, or like what Julian said, a ETL job.
According to some tests I had run about Calcite streaming, a simple division by zero error could break the whole query job. Say, If Calcite does not provide a option to silence the error, users should spend extract daily time on checking whether their streaming job is alive if their SQL includes a "/" operator.

The solutions of B and C are different: B is to ignore and drop a whole row when error happens, and C is to make the failed call return null value. In C, the major reason to introduce "CATCH_ERROR" function is to represent the error handling on Rel, whether to enable the function in Parser is not important.
At first I personally prefer C, but as C does need more code change than B, I don't have a very strong inclination between B and C now.

At last, I agree that making enough discussion on introducing new feature is very important. If we don't really need such a feature now, I think I can help whenever it is considered helpful in future.

Thanks,
Hongze

From: Jesus Camacho Rodriguez
Date: 2018-10-18 10:26
To: dev@calcite.apache.org
Subject: Re: Exception-handling in built-in functions
I do not believe there is enough reason to block CALCITE-525. IMO, CALCITE-525 describes a problem that some Calcite users are facing and a reasonable plugable solution. We should not be vetoing such a feature without providing viable alternatives. (Without having checked the specific implementation details, I prefer approach B described below as it is less intrusive. And A should be fixed in a different issue.)
I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or Postgres, and it has always tried to provide flexibility to underlying engines, one of the reasons for its wide adoption. In addition, systems are not forced to use this feature, it is tagged as experimental and by default we are still running in same mode. I believe that is sufficient.
Personally, I will not be happy if a developer feels compelled to fork Calcite or stop contributing code because we do not accept features such as the one described there.

Thanks,
Jesús

On 10/17/18, 5:17 PM, "Michael Mior" <mm...@apache.org> wrote:

    My apologies for missing this thread a couple days ago. (Thanks for pinging
    it.) Here's my two cents: taking care of contributors to the project is
    just as important (if not more important) than taking care of the code. I'm
    not saying we should merge terrible code just to keep each other happy, but
    I don't think that's the case here. If anyone writes some code which you
    disagree with, you should be free to voice your disagreement. However,
    especially when the code is from a core contributor and the argument
    focuses on potential future problems, I think it's important to consider
    that people who have shown dedication to the project over the years are
    very likely to be around and willing to fix these problems as they arise.

    Code which turns out to cause problems can always be deleted, reverted,
    refactored, etc. It's much harder to back out when a contributor is burned
    out or interpersonal conflicts get heated.

    --
    Michael Mior
    mmior@apache.org

    Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jh...@apache.org> a écrit :

    > Vladimir,
    >
    > You’ve made your points. And I hear them.
    >
    > However I get the impression that you are not open to persuasion. Which
    > means that I am wasting my time trying to reach consensus with you. Which
    > means that people win arguments not on merit, but based upon who is most
    > persistent.
    >
    > Here is my point. Calcite's goal is not to re-create what Oracle or
    > PostgreSQL did ten years later. It is a platform that allows people to
    > write their own data engine. If they want to redefine the “+” operator such
    > that 2 + 2 returns 5, the platform should allow it.
    >
    > Certainly if they want to engineer their own error-handling strategy, we
    > should let them do it. I didn’t have the energy to find an example of a SQL
    > engine that discards rows with divide-by-zero errors, but I believe there
    > is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
    > engines that I have worked on that performed ETL-like tasks, all had that
    > capability. And all ETL tools have very flexible error-handling strategies.
    > They are not SQL-based, but Calcite is not exclusively for SQL systems.
    >
    > I have been designing and building world-class data engines for 30 years.
    > Please take me on good faith that a flexible error-handing strategy is a
    > good idea. Don’t force me to bicker over email for hours and hours. When a
    > long discussion leads to the rejection of a contribution, I get
    > considerably closer to burning out.
    >
    > Julian
    >
    >
    > > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
    > sitnikov.vladimir@gmail.com> wrote:
    > >
    > > Juilian>Hey, folks. We need your input here.
    > >
    > > Here are my thoughts:
    > > 1) I think the features we add should have at least some level of
    > > consistency
    > > 2) It is much safer to adopt well-known features rather than be pioneers
    > in
    > > the field. I do not mean we must wait for someone else implement and try
    > > out a feature, however I would not rush for implementing a feature that
    > > no-one else explored.
    > >
    > > CALCITE-525 has two key points:
    > > A) Current implementation of enumerable factors code like 0/0 to a static
    > > field of a generated code. It causes the generated code to fail at load
    > > time even before the query is executed.
    > > Of course that is a bug, and I'm even inclined to remove that "static
    > > fields"
    > >
    > > B) Someone (Hongze? Juilan?) suggest to implement a mode to silently
    > ignore
    > > the error (e.g. by ignoring the row or by returning default value).
    > > First of all, I don't think "ignore the row" kind of processing would do
    > > any good to the user since it would not be possible to predict the
    > output.
    > > "ignore the row" is very tricky when join/semijoin/antijoin is there.
    > >
    > > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think
    > we
    > > should not rush for it.
    > >
    > > C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
    > > 0)  EMPTY ON ERROR  kind of functions.
    > > That enables to confine the scope of the error, however I don't think it
    > > would be used often (does that mean one would have to wrap each
    > > expression?), and this "catch error" would be non-trivial to propagate to
    > > the downstream executors.
    > > On top of that, we might end up inventing full-blown
    > try-catch-catch-catch
    > > syntax.
    > >
    > > I truly see no business value in implementing B/C, however I do see the
    > > pain it would introduce. It would complicate Calcite maintenance. "B"
    > could
    > > silently produce wrong results, and I'm sure we don't want get results
    > out
    > > of thin air.
    > >
    > > Vladimir
    >
    >

Re: Exception-handling in built-in functions

Posted by Jesus Camacho Rodriguez <jc...@hortonworks.com>.

I do not believe there is enough reason to block CALCITE-525. IMO, CALCITE-525 describes a problem that some Calcite users are facing and a reasonable plugable solution. We should not be vetoing such a feature without providing viable alternatives. (Without having checked the specific implementation details, I prefer approach B described below as it is less intrusive. And A should be fixed in a different issue.)
I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or Postgres, and it has always tried to provide flexibility to underlying engines, one of the reasons for its wide adoption. In addition, systems are not forced to use this feature, it is tagged as experimental and by default we are still running in same mode. I believe that is sufficient.
Personally, I will not be happy if a developer feels compelled to fork Calcite or stop contributing code because we do not accept features such as the one described there.

Thanks,
Jesús


On 10/17/18, 5:17 PM, "Michael Mior" <mm...@apache.org> wrote:

    My apologies for missing this thread a couple days ago. (Thanks for pinging
    it.) Here's my two cents: taking care of contributors to the project is
    just as important (if not more important) than taking care of the code. I'm
    not saying we should merge terrible code just to keep each other happy, but
    I don't think that's the case here. If anyone writes some code which you
    disagree with, you should be free to voice your disagreement. However,
    especially when the code is from a core contributor and the argument
    focuses on potential future problems, I think it's important to consider
    that people who have shown dedication to the project over the years are
    very likely to be around and willing to fix these problems as they arise.
    
    Code which turns out to cause problems can always be deleted, reverted,
    refactored, etc. It's much harder to back out when a contributor is burned
    out or interpersonal conflicts get heated.
    
    --
    Michael Mior
    mmior@apache.org
    
    
    Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jh...@apache.org> a écrit :
    
    > Vladimir,
    >
    > You’ve made your points. And I hear them.
    >
    > However I get the impression that you are not open to persuasion. Which
    > means that I am wasting my time trying to reach consensus with you. Which
    > means that people win arguments not on merit, but based upon who is most
    > persistent.
    >
    > Here is my point. Calcite's goal is not to re-create what Oracle or
    > PostgreSQL did ten years later. It is a platform that allows people to
    > write their own data engine. If they want to redefine the “+” operator such
    > that 2 + 2 returns 5, the platform should allow it.
    >
    > Certainly if they want to engineer their own error-handling strategy, we
    > should let them do it. I didn’t have the energy to find an example of a SQL
    > engine that discards rows with divide-by-zero errors, but I believe there
    > is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
    > engines that I have worked on that performed ETL-like tasks, all had that
    > capability. And all ETL tools have very flexible error-handling strategies.
    > They are not SQL-based, but Calcite is not exclusively for SQL systems.
    >
    > I have been designing and building world-class data engines for 30 years.
    > Please take me on good faith that a flexible error-handing strategy is a
    > good idea. Don’t force me to bicker over email for hours and hours. When a
    > long discussion leads to the rejection of a contribution, I get
    > considerably closer to burning out.
    >
    > Julian
    >
    >
    > > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
    > sitnikov.vladimir@gmail.com> wrote:
    > >
    > > Juilian>Hey, folks. We need your input here.
    > >
    > > Here are my thoughts:
    > > 1) I think the features we add should have at least some level of
    > > consistency
    > > 2) It is much safer to adopt well-known features rather than be pioneers
    > in
    > > the field. I do not mean we must wait for someone else implement and try
    > > out a feature, however I would not rush for implementing a feature that
    > > no-one else explored.
    > >
    > > CALCITE-525 has two key points:
    > > A) Current implementation of enumerable factors code like 0/0 to a static
    > > field of a generated code. It causes the generated code to fail at load
    > > time even before the query is executed.
    > > Of course that is a bug, and I'm even inclined to remove that "static
    > > fields"
    > >
    > > B) Someone (Hongze? Juilan?) suggest to implement a mode to silently
    > ignore
    > > the error (e.g. by ignoring the row or by returning default value).
    > > First of all, I don't think "ignore the row" kind of processing would do
    > > any good to the user since it would not be possible to predict the
    > output.
    > > "ignore the row" is very tricky when join/semijoin/antijoin is there.
    > >
    > > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think
    > we
    > > should not rush for it.
    > >
    > > C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
    > > 0)  EMPTY ON ERROR  kind of functions.
    > > That enables to confine the scope of the error, however I don't think it
    > > would be used often (does that mean one would have to wrap each
    > > expression?), and this "catch error" would be non-trivial to propagate to
    > > the downstream executors.
    > > On top of that, we might end up inventing full-blown
    > try-catch-catch-catch
    > > syntax.
    > >
    > > I truly see no business value in implementing B/C, however I do see the
    > > pain it would introduce. It would complicate Calcite maintenance. "B"
    > could
    > > silently produce wrong results, and I'm sure we don't want get results
    > out
    > > of thin air.
    > >
    > > Vladimir
    >
    >

Re: Exception-handling in built-in functions

Posted by Michael Mior <mm...@apache.org>.

My apologies for missing this thread a couple days ago. (Thanks for pinging
it.) Here's my two cents: taking care of contributors to the project is
just as important (if not more important) than taking care of the code. I'm
not saying we should merge terrible code just to keep each other happy, but
I don't think that's the case here. If anyone writes some code which you
disagree with, you should be free to voice your disagreement. However,
especially when the code is from a core contributor and the argument
focuses on potential future problems, I think it's important to consider
that people who have shown dedication to the project over the years are
very likely to be around and willing to fix these problems as they arise.

Code which turns out to cause problems can always be deleted, reverted,
refactored, etc. It's much harder to back out when a contributor is burned
out or interpersonal conflicts get heated.

--
Michael Mior
mmior@apache.org


Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jh...@apache.org> a écrit :

> Vladimir,
>
> You’ve made your points. And I hear them.
>
> However I get the impression that you are not open to persuasion. Which
> means that I am wasting my time trying to reach consensus with you. Which
> means that people win arguments not on merit, but based upon who is most
> persistent.
>
> Here is my point. Calcite's goal is not to re-create what Oracle or
> PostgreSQL did ten years later. It is a platform that allows people to
> write their own data engine. If they want to redefine the “+” operator such
> that 2 + 2 returns 5, the platform should allow it.
>
> Certainly if they want to engineer their own error-handling strategy, we
> should let them do it. I didn’t have the energy to find an example of a SQL
> engine that discards rows with divide-by-zero errors, but I believe there
> is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
> engines that I have worked on that performed ETL-like tasks, all had that
> capability. And all ETL tools have very flexible error-handling strategies.
> They are not SQL-based, but Calcite is not exclusively for SQL systems.
>
> I have been designing and building world-class data engines for 30 years.
> Please take me on good faith that a flexible error-handing strategy is a
> good idea. Don’t force me to bicker over email for hours and hours. When a
> long discussion leads to the rejection of a contribution, I get
> considerably closer to burning out.
>
> Julian
>
>
> > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
> >
> > Juilian>Hey, folks. We need your input here.
> >
> > Here are my thoughts:
> > 1) I think the features we add should have at least some level of
> > consistency
> > 2) It is much safer to adopt well-known features rather than be pioneers
> in
> > the field. I do not mean we must wait for someone else implement and try
> > out a feature, however I would not rush for implementing a feature that
> > no-one else explored.
> >
> > CALCITE-525 has two key points:
> > A) Current implementation of enumerable factors code like 0/0 to a static
> > field of a generated code. It causes the generated code to fail at load
> > time even before the query is executed.
> > Of course that is a bug, and I'm even inclined to remove that "static
> > fields"
> >
> > B) Someone (Hongze? Juilan?) suggest to implement a mode to silently
> ignore
> > the error (e.g. by ignoring the row or by returning default value).
> > First of all, I don't think "ignore the row" kind of processing would do
> > any good to the user since it would not be possible to predict the
> output.
> > "ignore the row" is very tricky when join/semijoin/antijoin is there.
> >
> > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think
> we
> > should not rush for it.
> >
> > C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
> > 0)  EMPTY ON ERROR  kind of functions.
> > That enables to confine the scope of the error, however I don't think it
> > would be used often (does that mean one would have to wrap each
> > expression?), and this "catch error" would be non-trivial to propagate to
> > the downstream executors.
> > On top of that, we might end up inventing full-blown
> try-catch-catch-catch
> > syntax.
> >
> > I truly see no business value in implementing B/C, however I do see the
> > pain it would introduce. It would complicate Calcite maintenance. "B"
> could
> > silently produce wrong results, and I'm sure we don't want get results
> out
> > of thin air.
> >
> > Vladimir
>
>

Re: Exception-handling in built-in functions

Posted by Julian Hyde <jh...@apache.org>.

Vladimir,

You’ve made your points. And I hear them.

However I get the impression that you are not open to persuasion. Which means that I am wasting my time trying to reach consensus with you. Which means that people win arguments not on merit, but based upon who is most persistent.

Here is my point. Calcite's goal is not to re-create what Oracle or PostgreSQL did ten years later. It is a platform that allows people to write their own data engine. If they want to redefine the “+” operator such that 2 + 2 returns 5, the platform should allow it.

Certainly if they want to engineer their own error-handling strategy, we should let them do it. I didn’t have the energy to find an example of a SQL engine that discards rows with divide-by-zero errors, but I believe there is one. I suspect that both Broadbase, SQLstream and Hive, three SQL engines that I have worked on that performed ETL-like tasks, all had that capability. And all ETL tools have very flexible error-handling strategies. They are not SQL-based, but Calcite is not exclusively for SQL systems.

I have been designing and building world-class data engines for 30 years. Please take me on good faith that a flexible error-handing strategy is a good idea. Don’t force me to bicker over email for hours and hours. When a long discussion leads to the rejection of a contribution, I get considerably closer to burning out.

Julian

> On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <si...@gmail.com> wrote:
> 
> Juilian>Hey, folks. We need your input here.
> 
> Here are my thoughts:
> 1) I think the features we add should have at least some level of
> consistency
> 2) It is much safer to adopt well-known features rather than be pioneers in
> the field. I do not mean we must wait for someone else implement and try
> out a feature, however I would not rush for implementing a feature that
> no-one else explored.
> 
> CALCITE-525 has two key points:
> A) Current implementation of enumerable factors code like 0/0 to a static
> field of a generated code. It causes the generated code to fail at load
> time even before the query is executed.
> Of course that is a bug, and I'm even inclined to remove that "static
> fields"
> 
> B) Someone (Hongze? Juilan?) suggest to implement a mode to silently ignore
> the error (e.g. by ignoring the row or by returning default value).
> First of all, I don't think "ignore the row" kind of processing would do
> any good to the user since it would not be possible to predict the output.
> "ignore the row" is very tricky when join/semijoin/antijoin is there.
> 
> I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think we
> should not rush for it.
> 
> C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
> 0)  EMPTY ON ERROR  kind of functions.
> That enables to confine the scope of the error, however I don't think it
> would be used often (does that mean one would have to wrap each
> expression?), and this "catch error" would be non-trivial to propagate to
> the downstream executors.
> On top of that, we might end up inventing full-blown try-catch-catch-catch
> syntax.
> 
> I truly see no business value in implementing B/C, however I do see the
> pain it would introduce. It would complicate Calcite maintenance. "B" could
> silently produce wrong results, and I'm sure we don't want get results out
> of thin air.
> 
> Vladimir

Re: Exception-handling in built-in functions

Posted by Vladimir Sitnikov <si...@gmail.com>.

Juilian>Hey, folks. We need your input here.

Here are my thoughts:
1) I think the features we add should have at least some level of
consistency
2) It is much safer to adopt well-known features rather than be pioneers in
the field. I do not mean we must wait for someone else implement and try
out a feature, however I would not rush for implementing a feature that
no-one else explored.

CALCITE-525 has two key points:
A) Current implementation of enumerable factors code like 0/0 to a static
field of a generated code. It causes the generated code to fail at load
time even before the query is executed.
Of course that is a bug, and I'm even inclined to remove that "static
fields"

B) Someone (Hongze? Juilan?) suggest to implement a mode to silently ignore
the error (e.g. by ignoring the row or by returning default value).
First of all, I don't think "ignore the row" kind of processing would do
any good to the user since it would not be possible to predict the output.
"ignore the row" is very tricky when join/semijoin/antijoin is there.

I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think we
should not rush for it.

C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
0)  EMPTY ON ERROR  kind of functions.
That enables to confine the scope of the error, however I don't think it
would be used often (does that mean one would have to wrap each
expression?), and this "catch error" would be non-trivial to propagate to
the downstream executors.
On top of that, we might end up inventing full-blown try-catch-catch-catch
syntax.

I truly see no business value in implementing B/C, however I do see the
pain it would introduce. It would complicate Calcite maintenance. "B" could
silently produce wrong results, and I'm sure we don't want get results out
of thin air.

Vladimir

Re: Exception-handling in built-in functions

Posted by Julian Hyde <jh...@apache.org>.

Hey, folks. We need your input here. This is a crisis.

I do not want us to become a project where people rule by veto, override consensus, and (heaven forbid) back out each other’s commits. But this case is the third instance where this has happened (2438 and 2458 are the other instances, all within the past few weeks) and it is becoming a pattern. I am finding it less and less pleasant to interact with this project.

Julian

> On Oct 15, 2018, at 11:10 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> There’s a discussion on https://issues.apache.org/jira/browse/CALCITE-525 <https://issues.apache.org/jira/browse/CALCITE-525>, "Exception-handling in built-in functions” that seems to be heading for stalemate. There is a PR that I am inclined to accept (with some modifications, probably), and based on his comments so far, Vladimir seems to be set against.
> 
> We had our first commit veto from Vladimir a couple of months ago, and I don’t want to have another.
> 
> How can we work through this and get to consensus?
> 
> Julian
>