You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Konstantin Orlov <ko...@gridgain.com> on 2021/04/29 14:25:27 UTC

RelFieldTrimmer throws an exception in certain cases

Hi all.

I faced a problem preventing certain queries being planned because RelFieldTrimmer throws 
an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds for length 0”.

The problem is here [1]:
    
    // If they are asking for no fields, we can't give them what they want,
    // because zero-column records are illegal. Give them the last field,
    // which is unlikely to be a system field.
    if (fieldsUsed.isEmpty()) {
      fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
    }

In case fieldsUsed.isEmpty we returns last field, but it is currently possible that fieldCount=0 as well.  

After some investigation I find out that the reason is empty record derived as row type for Aggregate.
It is possible when an aggregate has an empty group key and no aggregate calls.

So the question is whether an empty record is a legal row type for an aggregation node?

Below is a reproducer for this problem, just put it at RelFieldTrimmerTest:

  @Test void test() {
    class ContextImpl implements Context {
      final Object target;

      ContextImpl(Object target) {
        this.target = Objects.requireNonNull(target, "target");
      }

      @Override public <T extends Object> @Nullable T unwrap(Class<T> clazz) {
        if (clazz.isInstance(target)) {
          return clazz.cast(target);
        }
        return null;
      }
    }

    // RelBuilder hides problem when simplifyValues=true, hence we need to disable it
    final RelBuilder builder = RelBuilder.create(config()
        .context(new ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());

    final RelNode root =
        builder.scan("EMP")
            .aggregate(builder.groupKey())
            .filter(builder.literal(false))
            .project(builder.literal(42))
            .build();

    final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null, builder);
    fieldTrimmer.trim(root); // fails with ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
  }


[1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197

-- 
Regards,
Konstantin Orlov





Re: RelFieldTrimmer throws an exception in certain cases

Posted by Julian Hyde <jh...@gmail.com>.
Thanks, Konstantin.

I have logged https://issues.apache.org/jira/browse/CALCITE-4597 to make the policy configurable. Eventually I would like to allow empty row types throughout the system, but until then, rules and RelFieldTrimmer should follow Postel’s law [1] and accept empty row types but try not to produce them.

Julian

[1] https://en.wikipedia.org/wiki/Robustness_principle

> On May 5, 2021, at 12:51 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
> 
>> Konstantin, can you log it, please
> 
> Yes, sure. Here it is [1]
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-4596 <https://issues.apache.org/jira/browse/CALCITE-4596>
> 
> -- 
> Regards,
> Konstantin Orlov
> 
> 
> 
>> On 4 May 2021, at 21:29, Julian Hyde <jh...@apache.org> wrote:
>> 
>> Regardless of which direction we go (allowing zero-field record types, or disallowing them), Konstantin has found a bug. Konstantin, can you log it, please.
>> 
>> On 2021/04/29 14:25:27, Konstantin Orlov <ko...@gridgain.com> wrote: 
>>> Hi all.
>>> 
>>> I faced a problem preventing certain queries being planned because RelFieldTrimmer throws 
>>> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds for length 0”.
>>> 
>>> The problem is here [1]:
>>> 
>>>   // If they are asking for no fields, we can't give them what they want,
>>>   // because zero-column records are illegal. Give them the last field,
>>>   // which is unlikely to be a system field.
>>>   if (fieldsUsed.isEmpty()) {
>>>     fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>>>   }
>>> 
>>> In case fieldsUsed.isEmpty we returns last field, but it is currently possible that fieldCount=0 as well.  
>>> 
>>> After some investigation I find out that the reason is empty record derived as row type for Aggregate.
>>> It is possible when an aggregate has an empty group key and no aggregate calls.
>>> 
>>> So the question is whether an empty record is a legal row type for an aggregation node?
>>> 
>>> Below is a reproducer for this problem, just put it at RelFieldTrimmerTest:
>>> 
>>> @Test void test() {
>>>   class ContextImpl implements Context {
>>>     final Object target;
>>> 
>>>     ContextImpl(Object target) {
>>>       this.target = Objects.requireNonNull(target, "target");
>>>     }
>>> 
>>>     @Override public <T extends Object> @Nullable T unwrap(Class<T> clazz) {
>>>       if (clazz.isInstance(target)) {
>>>         return clazz.cast(target);
>>>       }
>>>       return null;
>>>     }
>>>   }
>>> 
>>>   // RelBuilder hides problem when simplifyValues=true, hence we need to disable it
>>>   final RelBuilder builder = RelBuilder.create(config()
>>>       .context(new ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
>>> 
>>>   final RelNode root =
>>>       builder.scan("EMP")
>>>           .aggregate(builder.groupKey())
>>>           .filter(builder.literal(false))
>>>           .project(builder.literal(42))
>>>           .build();
>>> 
>>>   final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null, builder);
>>>   fieldTrimmer.trim(root); // fails with ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
>>> }
>>> 
>>> 
>>> [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
>>> 
>>> -- 
>>> Regards,
>>> Konstantin Orlov
>>> 
>>> 
>>> 
>>> 
>>> 
> 


Re: RelFieldTrimmer throws an exception in certain cases

Posted by Konstantin Orlov <ko...@gridgain.com>.
> Konstantin, can you log it, please

Yes, sure. Here it is [1]

[1] https://issues.apache.org/jira/browse/CALCITE-4596 <https://issues.apache.org/jira/browse/CALCITE-4596>

-- 
Regards,
Konstantin Orlov



> On 4 May 2021, at 21:29, Julian Hyde <jh...@apache.org> wrote:
> 
> Regardless of which direction we go (allowing zero-field record types, or disallowing them), Konstantin has found a bug. Konstantin, can you log it, please.
> 
> On 2021/04/29 14:25:27, Konstantin Orlov <ko...@gridgain.com> wrote: 
>> Hi all.
>> 
>> I faced a problem preventing certain queries being planned because RelFieldTrimmer throws 
>> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds for length 0”.
>> 
>> The problem is here [1]:
>> 
>>    // If they are asking for no fields, we can't give them what they want,
>>    // because zero-column records are illegal. Give them the last field,
>>    // which is unlikely to be a system field.
>>    if (fieldsUsed.isEmpty()) {
>>      fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>>    }
>> 
>> In case fieldsUsed.isEmpty we returns last field, but it is currently possible that fieldCount=0 as well.  
>> 
>> After some investigation I find out that the reason is empty record derived as row type for Aggregate.
>> It is possible when an aggregate has an empty group key and no aggregate calls.
>> 
>> So the question is whether an empty record is a legal row type for an aggregation node?
>> 
>> Below is a reproducer for this problem, just put it at RelFieldTrimmerTest:
>> 
>>  @Test void test() {
>>    class ContextImpl implements Context {
>>      final Object target;
>> 
>>      ContextImpl(Object target) {
>>        this.target = Objects.requireNonNull(target, "target");
>>      }
>> 
>>      @Override public <T extends Object> @Nullable T unwrap(Class<T> clazz) {
>>        if (clazz.isInstance(target)) {
>>          return clazz.cast(target);
>>        }
>>        return null;
>>      }
>>    }
>> 
>>    // RelBuilder hides problem when simplifyValues=true, hence we need to disable it
>>    final RelBuilder builder = RelBuilder.create(config()
>>        .context(new ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
>> 
>>    final RelNode root =
>>        builder.scan("EMP")
>>            .aggregate(builder.groupKey())
>>            .filter(builder.literal(false))
>>            .project(builder.literal(42))
>>            .build();
>> 
>>    final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null, builder);
>>    fieldTrimmer.trim(root); // fails with ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
>>  }
>> 
>> 
>> [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
>> 
>> -- 
>> Regards,
>> Konstantin Orlov
>> 
>> 
>> 
>> 
>> 


Re: RelFieldTrimmer throws an exception in certain cases

Posted by Julian Hyde <jh...@apache.org>.
Regardless of which direction we go (allowing zero-field record types, or disallowing them), Konstantin has found a bug. Konstantin, can you log it, please.

On 2021/04/29 14:25:27, Konstantin Orlov <ko...@gridgain.com> wrote: 
> Hi all.
> 
> I faced a problem preventing certain queries being planned because RelFieldTrimmer throws 
> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds for length 0”.
> 
> The problem is here [1]:
>     
>     // If they are asking for no fields, we can't give them what they want,
>     // because zero-column records are illegal. Give them the last field,
>     // which is unlikely to be a system field.
>     if (fieldsUsed.isEmpty()) {
>       fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>     }
> 
> In case fieldsUsed.isEmpty we returns last field, but it is currently possible that fieldCount=0 as well.  
> 
> After some investigation I find out that the reason is empty record derived as row type for Aggregate.
> It is possible when an aggregate has an empty group key and no aggregate calls.
> 
> So the question is whether an empty record is a legal row type for an aggregation node?
> 
> Below is a reproducer for this problem, just put it at RelFieldTrimmerTest:
> 
>   @Test void test() {
>     class ContextImpl implements Context {
>       final Object target;
> 
>       ContextImpl(Object target) {
>         this.target = Objects.requireNonNull(target, "target");
>       }
> 
>       @Override public <T extends Object> @Nullable T unwrap(Class<T> clazz) {
>         if (clazz.isInstance(target)) {
>           return clazz.cast(target);
>         }
>         return null;
>       }
>     }
> 
>     // RelBuilder hides problem when simplifyValues=true, hence we need to disable it
>     final RelBuilder builder = RelBuilder.create(config()
>         .context(new ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
> 
>     final RelNode root =
>         builder.scan("EMP")
>             .aggregate(builder.groupKey())
>             .filter(builder.literal(false))
>             .project(builder.literal(42))
>             .build();
> 
>     final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null, builder);
>     fieldTrimmer.trim(root); // fails with ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
>   }
> 
> 
> [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
> 
> -- 
> Regards,
> Konstantin Orlov
> 
> 
> 
> 
> 

Re: RelFieldTrimmer throws an exception in certain cases

Posted by Julian Hyde <jh...@gmail.com>.
Wow, I just ran into another. ValuesNode.createRows (in the interpreter) [1]. What a coincidence!

Julian

[1] https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/interpreter/ValuesNode.java#L56 <https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/interpreter/ValuesNode.java#L56> 

> On May 3, 2021, at 2:55 PM, Julian Hyde <jh...@gmail.com> wrote:
> 
> I am inclined to be brave and idealistic: officially allow records with no fields, and add enough tests that we know that the existing rules can handle them.
> 
> Of course there is a non-zero risk that we will break some things. But these things are already broken, because we zero-field RelNodes can still crop up.
> 
> One place I know that will need to be fixed is 
> 
>    ImmutableList RelBuilder.tupleList(int columnCount, Object[] values)
> 
> because is divides by columnCount.
> 
> Julian
> 
> 
>> On May 3, 2021, at 8:55 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
>> 
>> Hello,
>> 
>> Another case where it will look a bit cleaner with an empty record is an insert with values. 
>> Currently a query like INSERT INTO PROJECT(projectId, name) VALUES (?, ?) have a plan like follow:
>> 
>> TableModify(table=[[PUBLIC, PROJECT]], operation=[INSERT], flattened=[false])
>> Project(PROJECTID=[?1], NAME=[?2])
>>   Values(tuples=[[{ 0 }]])
>> 
>> 
>>> @Konstantin how did you come with the empty aggregate? Was it also a result
>>> of trimming?
>> 
>> The empty aggregate is created for query like this: SELECT 42 FROM my_table HAVING 82 > 42.
>> Although this query is quite synthetic, it’s still grammatically correct, and we need to be able to handle it.
>> 
>> Personally I tend to think that allow the empty records is a better option. For now there is a code that supposed
>> to overcome introduced limitation, but causes another issues.
>> 
>> But since the benefits are not clear and the effort required seems to be bigger, the more rational option is to ban them.
>> 
>> -- 
>> Regards,
>> Konstantin Orlov
>> 
>> 
>> 
>> 
>>> On 2 May 2021, at 01:30, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>> Here's a valid SQL query where an empty projection naturally arises:
>>> 
>>> SELECT d.dname
>>> FROM dept AS d
>>>  JOIN emp AS e ON e.deptno = d.deptno
>>> WHERE d.deptno = 10
>>> 
>>> You want the name of department 10 to be printed N times, where N is
>>> the number of employees in the department, but you don't need any
>>> attributes from the employee table. So it can be optimized to the
>>> following pseudo-SQL:
>>> 
>>> SELECT d.dname
>>> FROM (
>>>  SELECT d.deptno, d.dname FROM dept WHERE deptno = 10) AS d
>>> CROSS JOIN (
>>>  SELECT /* no columns */ FROM emp WHERE deptno = 10) AS e
>>> 
>>> If there are 4 employees in department 10, then the emp side of this
>>> query would return 4 empty records:
>>> 
>>> ()
>>> ()
>>> ()
>>> ()
>>> 
>>> An Aggregate with no group keys and no aggregate calls is less
>>> interesting, because it always returns one row, and can therefore be
>>> eliminated.
>>> 
>>> Julian
>>> 
>>> On Fri, Apr 30, 2021 at 2:53 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> The first operator that comes to mind when we are talking about empty
>>>> records is a projection that projects nothing. In theory and in most
>>>> database books the projection is allowed to have arity 0 but in practice I
>>>> never had to use it; neither aggregate nor any other operator. In fact an
>>>> aggregate with empty group keys and aggregate calls resembles a lot an
>>>> empty projection.
>>>> 
>>>> @Konstantin how did you come with the empty aggregate? Was it also a result
>>>> of trimming?
>>>> 
>>>> If we have to make a decision now I would prefer to disallow them globally
>>>> unless somebody comes up with a compelling use-case.
>>>> In order to express other query languages (e.g,, datalog, conjunctive
>>>> queries, etc) with the algebraic operators we may need 0-arity operators
>>>> but I don't have an example readily in mind.
>>>> 
>>>> Best,
>>>> Stamatis
>>>> 
>>>> On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde <jh...@gmail.com> wrote:
>>>> 
>>>>>> So the question is whether an empty record is a legal row type for an
>>>>> aggregation node?
>>>>> 
>>>>> As that comment indicates, we have tried to avoid empty records — that is,
>>>>> a relational expression that produces a row type with zero fields — but as
>>>>> you have just discovered, we have failed to go all the way.
>>>>> 
>>>>> Mathematically, it is purer to allow empty records. SQL does not allow
>>>>> them, they crop up naturally in quite a lot of corner cases, especially
>>>>> after trimming fields.
>>>>> 
>>>>> Pragmatically, I assumed that quite a lot of code was making the
>>>>> assumption that records were not empty. And that empty records are
>>>>> sufficiently rare that we would never be able to find all of those places
>>>>> via testing.
>>>>> 
>>>>> Is it time to decide? If we allow empty records, we should test that all
>>>>> relational operators can handle them. If we ban them, then we should (say)
>>>>> throw whenever someone registers a RelNode that has an empty row type.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Apr 29, 2021, at 7:25 AM, Konstantin Orlov <ko...@gridgain.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi all.
>>>>>> 
>>>>>> I faced a problem preventing certain queries being planned because
>>>>> RelFieldTrimmer throws
>>>>>> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds
>>>>> for length 0”.
>>>>>> 
>>>>>> The problem is here [1]:
>>>>>> 
>>>>>> // If they are asking for no fields, we can't give them what they
>>>>> want,
>>>>>> // because zero-column records are illegal. Give them the last field,
>>>>>> // which is unlikely to be a system field.
>>>>>> if (fieldsUsed.isEmpty()) {
>>>>>>   fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>>>>>> }
>>>>>> 
>>>>>> In case fieldsUsed.isEmpty we returns last field, but it is currently
>>>>> possible that fieldCount=0 as well.
>>>>>> 
>>>>>> After some investigation I find out that the reason is empty record
>>>>> derived as row type for Aggregate.
>>>>>> It is possible when an aggregate has an empty group key and no aggregate
>>>>> calls.
>>>>>> 
>>>>>> So the question is whether an empty record is a legal row type for an
>>>>> aggregation node?
>>>>>> 
>>>>>> Below is a reproducer for this problem, just put it at
>>>>> RelFieldTrimmerTest:
>>>>>> 
>>>>>> @Test void test() {
>>>>>> class ContextImpl implements Context {
>>>>>>   final Object target;
>>>>>> 
>>>>>>   ContextImpl(Object target) {
>>>>>>     this.target = Objects.requireNonNull(target, "target");
>>>>>>   }
>>>>>> 
>>>>>>   @Override public <T extends Object> @Nullable T unwrap(Class<T>
>>>>> clazz) {
>>>>>>     if (clazz.isInstance(target)) {
>>>>>>       return clazz.cast(target);
>>>>>>     }
>>>>>>     return null;
>>>>>>   }
>>>>>> }
>>>>>> 
>>>>>> // RelBuilder hides problem when simplifyValues=true, hence we need
>>>>> to disable it
>>>>>> final RelBuilder builder = RelBuilder.create(config()
>>>>>>     .context(new
>>>>> ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
>>>>>> 
>>>>>> final RelNode root =
>>>>>>     builder.scan("EMP")
>>>>>>         .aggregate(builder.groupKey())
>>>>>>         .filter(builder.literal(false))
>>>>>>         .project(builder.literal(42))
>>>>>>         .build();
>>>>>> 
>>>>>> final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null,
>>>>> builder);
>>>>>> fieldTrimmer.trim(root); // fails with
>>>>> ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> [1]
>>>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
>>>>>> 
>>>>>> --
>>>>>> Regards,
>>>>>> Konstantin Orlov
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>> 
> 


Re: RelFieldTrimmer throws an exception in certain cases

Posted by Julian Hyde <jh...@gmail.com>.
I am inclined to be brave and idealistic: officially allow records with no fields, and add enough tests that we know that the existing rules can handle them.

Of course there is a non-zero risk that we will break some things. But these things are already broken, because we zero-field RelNodes can still crop up.

One place I know that will need to be fixed is 

    ImmutableList RelBuilder.tupleList(int columnCount, Object[] values)

because is divides by columnCount.

Julian


> On May 3, 2021, at 8:55 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
> 
> Hello,
> 
> Another case where it will look a bit cleaner with an empty record is an insert with values. 
> Currently a query like INSERT INTO PROJECT(projectId, name) VALUES (?, ?) have a plan like follow:
> 
> TableModify(table=[[PUBLIC, PROJECT]], operation=[INSERT], flattened=[false])
>  Project(PROJECTID=[?1], NAME=[?2])
>    Values(tuples=[[{ 0 }]])
> 
> 
>> @Konstantin how did you come with the empty aggregate? Was it also a result
>> of trimming?
> 
> The empty aggregate is created for query like this: SELECT 42 FROM my_table HAVING 82 > 42.
> Although this query is quite synthetic, it’s still grammatically correct, and we need to be able to handle it.
> 
> Personally I tend to think that allow the empty records is a better option. For now there is a code that supposed
> to overcome introduced limitation, but causes another issues.
> 
> But since the benefits are not clear and the effort required seems to be bigger, the more rational option is to ban them.
> 
> -- 
> Regards,
> Konstantin Orlov
> 
> 
> 
> 
>> On 2 May 2021, at 01:30, Julian Hyde <jh...@apache.org> wrote:
>> 
>> Here's a valid SQL query where an empty projection naturally arises:
>> 
>> SELECT d.dname
>> FROM dept AS d
>>   JOIN emp AS e ON e.deptno = d.deptno
>> WHERE d.deptno = 10
>> 
>> You want the name of department 10 to be printed N times, where N is
>> the number of employees in the department, but you don't need any
>> attributes from the employee table. So it can be optimized to the
>> following pseudo-SQL:
>> 
>> SELECT d.dname
>> FROM (
>>   SELECT d.deptno, d.dname FROM dept WHERE deptno = 10) AS d
>> CROSS JOIN (
>>   SELECT /* no columns */ FROM emp WHERE deptno = 10) AS e
>> 
>> If there are 4 employees in department 10, then the emp side of this
>> query would return 4 empty records:
>> 
>> ()
>> ()
>> ()
>> ()
>> 
>> An Aggregate with no group keys and no aggregate calls is less
>> interesting, because it always returns one row, and can therefore be
>> eliminated.
>> 
>> Julian
>> 
>> On Fri, Apr 30, 2021 at 2:53 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> The first operator that comes to mind when we are talking about empty
>>> records is a projection that projects nothing. In theory and in most
>>> database books the projection is allowed to have arity 0 but in practice I
>>> never had to use it; neither aggregate nor any other operator. In fact an
>>> aggregate with empty group keys and aggregate calls resembles a lot an
>>> empty projection.
>>> 
>>> @Konstantin how did you come with the empty aggregate? Was it also a result
>>> of trimming?
>>> 
>>> If we have to make a decision now I would prefer to disallow them globally
>>> unless somebody comes up with a compelling use-case.
>>> In order to express other query languages (e.g,, datalog, conjunctive
>>> queries, etc) with the algebraic operators we may need 0-arity operators
>>> but I don't have an example readily in mind.
>>> 
>>> Best,
>>> Stamatis
>>> 
>>> On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde <jh...@gmail.com> wrote:
>>> 
>>>>> So the question is whether an empty record is a legal row type for an
>>>> aggregation node?
>>>> 
>>>> As that comment indicates, we have tried to avoid empty records — that is,
>>>> a relational expression that produces a row type with zero fields — but as
>>>> you have just discovered, we have failed to go all the way.
>>>> 
>>>> Mathematically, it is purer to allow empty records. SQL does not allow
>>>> them, they crop up naturally in quite a lot of corner cases, especially
>>>> after trimming fields.
>>>> 
>>>> Pragmatically, I assumed that quite a lot of code was making the
>>>> assumption that records were not empty. And that empty records are
>>>> sufficiently rare that we would never be able to find all of those places
>>>> via testing.
>>>> 
>>>> Is it time to decide? If we allow empty records, we should test that all
>>>> relational operators can handle them. If we ban them, then we should (say)
>>>> throw whenever someone registers a RelNode that has an empty row type.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Apr 29, 2021, at 7:25 AM, Konstantin Orlov <ko...@gridgain.com>
>>>> wrote:
>>>>> 
>>>>> Hi all.
>>>>> 
>>>>> I faced a problem preventing certain queries being planned because
>>>> RelFieldTrimmer throws
>>>>> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds
>>>> for length 0”.
>>>>> 
>>>>> The problem is here [1]:
>>>>> 
>>>>>  // If they are asking for no fields, we can't give them what they
>>>> want,
>>>>>  // because zero-column records are illegal. Give them the last field,
>>>>>  // which is unlikely to be a system field.
>>>>>  if (fieldsUsed.isEmpty()) {
>>>>>    fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>>>>>  }
>>>>> 
>>>>> In case fieldsUsed.isEmpty we returns last field, but it is currently
>>>> possible that fieldCount=0 as well.
>>>>> 
>>>>> After some investigation I find out that the reason is empty record
>>>> derived as row type for Aggregate.
>>>>> It is possible when an aggregate has an empty group key and no aggregate
>>>> calls.
>>>>> 
>>>>> So the question is whether an empty record is a legal row type for an
>>>> aggregation node?
>>>>> 
>>>>> Below is a reproducer for this problem, just put it at
>>>> RelFieldTrimmerTest:
>>>>> 
>>>>> @Test void test() {
>>>>>  class ContextImpl implements Context {
>>>>>    final Object target;
>>>>> 
>>>>>    ContextImpl(Object target) {
>>>>>      this.target = Objects.requireNonNull(target, "target");
>>>>>    }
>>>>> 
>>>>>    @Override public <T extends Object> @Nullable T unwrap(Class<T>
>>>> clazz) {
>>>>>      if (clazz.isInstance(target)) {
>>>>>        return clazz.cast(target);
>>>>>      }
>>>>>      return null;
>>>>>    }
>>>>>  }
>>>>> 
>>>>>  // RelBuilder hides problem when simplifyValues=true, hence we need
>>>> to disable it
>>>>>  final RelBuilder builder = RelBuilder.create(config()
>>>>>      .context(new
>>>> ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
>>>>> 
>>>>>  final RelNode root =
>>>>>      builder.scan("EMP")
>>>>>          .aggregate(builder.groupKey())
>>>>>          .filter(builder.literal(false))
>>>>>          .project(builder.literal(42))
>>>>>          .build();
>>>>> 
>>>>>  final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null,
>>>> builder);
>>>>>  fieldTrimmer.trim(root); // fails with
>>>> ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
>>>>> }
>>>>> 
>>>>> 
>>>>> [1]
>>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> Konstantin Orlov
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
> 


Re: RelFieldTrimmer throws an exception in certain cases

Posted by Konstantin Orlov <ko...@gridgain.com>.
Hello,

Another case where it will look a bit cleaner with an empty record is an insert with values. 
Currently a query like INSERT INTO PROJECT(projectId, name) VALUES (?, ?) have a plan like follow:

TableModify(table=[[PUBLIC, PROJECT]], operation=[INSERT], flattened=[false])
  Project(PROJECTID=[?1], NAME=[?2])
    Values(tuples=[[{ 0 }]])


> @Konstantin how did you come with the empty aggregate? Was it also a result
> of trimming?

The empty aggregate is created for query like this: SELECT 42 FROM my_table HAVING 82 > 42.
Although this query is quite synthetic, it’s still grammatically correct, and we need to be able to handle it.

Personally I tend to think that allow the empty records is a better option. For now there is a code that supposed
to overcome introduced limitation, but causes another issues.

But since the benefits are not clear and the effort required seems to be bigger, the more rational option is to ban them.

-- 
Regards,
Konstantin Orlov




> On 2 May 2021, at 01:30, Julian Hyde <jh...@apache.org> wrote:
> 
> Here's a valid SQL query where an empty projection naturally arises:
> 
>  SELECT d.dname
>  FROM dept AS d
>    JOIN emp AS e ON e.deptno = d.deptno
>  WHERE d.deptno = 10
> 
> You want the name of department 10 to be printed N times, where N is
> the number of employees in the department, but you don't need any
> attributes from the employee table. So it can be optimized to the
> following pseudo-SQL:
> 
>  SELECT d.dname
>  FROM (
>    SELECT d.deptno, d.dname FROM dept WHERE deptno = 10) AS d
>  CROSS JOIN (
>    SELECT /* no columns */ FROM emp WHERE deptno = 10) AS e
> 
> If there are 4 employees in department 10, then the emp side of this
> query would return 4 empty records:
> 
>  ()
>  ()
>  ()
>  ()
> 
> An Aggregate with no group keys and no aggregate calls is less
> interesting, because it always returns one row, and can therefore be
> eliminated.
> 
> Julian
> 
> On Fri, Apr 30, 2021 at 2:53 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>> 
>> Hello,
>> 
>> The first operator that comes to mind when we are talking about empty
>> records is a projection that projects nothing. In theory and in most
>> database books the projection is allowed to have arity 0 but in practice I
>> never had to use it; neither aggregate nor any other operator. In fact an
>> aggregate with empty group keys and aggregate calls resembles a lot an
>> empty projection.
>> 
>> @Konstantin how did you come with the empty aggregate? Was it also a result
>> of trimming?
>> 
>> If we have to make a decision now I would prefer to disallow them globally
>> unless somebody comes up with a compelling use-case.
>> In order to express other query languages (e.g,, datalog, conjunctive
>> queries, etc) with the algebraic operators we may need 0-arity operators
>> but I don't have an example readily in mind.
>> 
>> Best,
>> Stamatis
>> 
>> On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde <jh...@gmail.com> wrote:
>> 
>>>> So the question is whether an empty record is a legal row type for an
>>> aggregation node?
>>> 
>>> As that comment indicates, we have tried to avoid empty records — that is,
>>> a relational expression that produces a row type with zero fields — but as
>>> you have just discovered, we have failed to go all the way.
>>> 
>>> Mathematically, it is purer to allow empty records. SQL does not allow
>>> them, they crop up naturally in quite a lot of corner cases, especially
>>> after trimming fields.
>>> 
>>> Pragmatically, I assumed that quite a lot of code was making the
>>> assumption that records were not empty. And that empty records are
>>> sufficiently rare that we would never be able to find all of those places
>>> via testing.
>>> 
>>> Is it time to decide? If we allow empty records, we should test that all
>>> relational operators can handle them. If we ban them, then we should (say)
>>> throw whenever someone registers a RelNode that has an empty row type.
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>> 
>>>> On Apr 29, 2021, at 7:25 AM, Konstantin Orlov <ko...@gridgain.com>
>>> wrote:
>>>> 
>>>> Hi all.
>>>> 
>>>> I faced a problem preventing certain queries being planned because
>>> RelFieldTrimmer throws
>>>> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds
>>> for length 0”.
>>>> 
>>>> The problem is here [1]:
>>>> 
>>>>   // If they are asking for no fields, we can't give them what they
>>> want,
>>>>   // because zero-column records are illegal. Give them the last field,
>>>>   // which is unlikely to be a system field.
>>>>   if (fieldsUsed.isEmpty()) {
>>>>     fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>>>>   }
>>>> 
>>>> In case fieldsUsed.isEmpty we returns last field, but it is currently
>>> possible that fieldCount=0 as well.
>>>> 
>>>> After some investigation I find out that the reason is empty record
>>> derived as row type for Aggregate.
>>>> It is possible when an aggregate has an empty group key and no aggregate
>>> calls.
>>>> 
>>>> So the question is whether an empty record is a legal row type for an
>>> aggregation node?
>>>> 
>>>> Below is a reproducer for this problem, just put it at
>>> RelFieldTrimmerTest:
>>>> 
>>>> @Test void test() {
>>>>   class ContextImpl implements Context {
>>>>     final Object target;
>>>> 
>>>>     ContextImpl(Object target) {
>>>>       this.target = Objects.requireNonNull(target, "target");
>>>>     }
>>>> 
>>>>     @Override public <T extends Object> @Nullable T unwrap(Class<T>
>>> clazz) {
>>>>       if (clazz.isInstance(target)) {
>>>>         return clazz.cast(target);
>>>>       }
>>>>       return null;
>>>>     }
>>>>   }
>>>> 
>>>>   // RelBuilder hides problem when simplifyValues=true, hence we need
>>> to disable it
>>>>   final RelBuilder builder = RelBuilder.create(config()
>>>>       .context(new
>>> ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
>>>> 
>>>>   final RelNode root =
>>>>       builder.scan("EMP")
>>>>           .aggregate(builder.groupKey())
>>>>           .filter(builder.literal(false))
>>>>           .project(builder.literal(42))
>>>>           .build();
>>>> 
>>>>   final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null,
>>> builder);
>>>>   fieldTrimmer.trim(root); // fails with
>>> ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
>>>> }
>>>> 
>>>> 
>>>> [1]
>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
>>>> 
>>>> --
>>>> Regards,
>>>> Konstantin Orlov
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 


Re: RelFieldTrimmer throws an exception in certain cases

Posted by Julian Hyde <jh...@apache.org>.
Here's a valid SQL query where an empty projection naturally arises:

  SELECT d.dname
  FROM dept AS d
    JOIN emp AS e ON e.deptno = d.deptno
  WHERE d.deptno = 10

You want the name of department 10 to be printed N times, where N is
the number of employees in the department, but you don't need any
attributes from the employee table. So it can be optimized to the
following pseudo-SQL:

  SELECT d.dname
  FROM (
    SELECT d.deptno, d.dname FROM dept WHERE deptno = 10) AS d
  CROSS JOIN (
    SELECT /* no columns */ FROM emp WHERE deptno = 10) AS e

If there are 4 employees in department 10, then the emp side of this
query would return 4 empty records:

  ()
  ()
  ()
  ()

An Aggregate with no group keys and no aggregate calls is less
interesting, because it always returns one row, and can therefore be
eliminated.

Julian

On Fri, Apr 30, 2021 at 2:53 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>
> Hello,
>
> The first operator that comes to mind when we are talking about empty
> records is a projection that projects nothing. In theory and in most
> database books the projection is allowed to have arity 0 but in practice I
> never had to use it; neither aggregate nor any other operator. In fact an
> aggregate with empty group keys and aggregate calls resembles a lot an
> empty projection.
>
> @Konstantin how did you come with the empty aggregate? Was it also a result
> of trimming?
>
> If we have to make a decision now I would prefer to disallow them globally
> unless somebody comes up with a compelling use-case.
> In order to express other query languages (e.g,, datalog, conjunctive
> queries, etc) with the algebraic operators we may need 0-arity operators
> but I don't have an example readily in mind.
>
> Best,
> Stamatis
>
> On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde <jh...@gmail.com> wrote:
>
> > > So the question is whether an empty record is a legal row type for an
> > aggregation node?
> >
> > As that comment indicates, we have tried to avoid empty records — that is,
> > a relational expression that produces a row type with zero fields — but as
> > you have just discovered, we have failed to go all the way.
> >
> > Mathematically, it is purer to allow empty records. SQL does not allow
> > them, they crop up naturally in quite a lot of corner cases, especially
> > after trimming fields.
> >
> > Pragmatically, I assumed that quite a lot of code was making the
> > assumption that records were not empty. And that empty records are
> > sufficiently rare that we would never be able to find all of those places
> > via testing.
> >
> > Is it time to decide? If we allow empty records, we should test that all
> > relational operators can handle them. If we ban them, then we should (say)
> > throw whenever someone registers a RelNode that has an empty row type.
> >
> > Julian
> >
> >
> >
> >
> > > On Apr 29, 2021, at 7:25 AM, Konstantin Orlov <ko...@gridgain.com>
> > wrote:
> > >
> > > Hi all.
> > >
> > > I faced a problem preventing certain queries being planned because
> > RelFieldTrimmer throws
> > > an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds
> > for length 0”.
> > >
> > > The problem is here [1]:
> > >
> > >    // If they are asking for no fields, we can't give them what they
> > want,
> > >    // because zero-column records are illegal. Give them the last field,
> > >    // which is unlikely to be a system field.
> > >    if (fieldsUsed.isEmpty()) {
> > >      fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
> > >    }
> > >
> > > In case fieldsUsed.isEmpty we returns last field, but it is currently
> > possible that fieldCount=0 as well.
> > >
> > > After some investigation I find out that the reason is empty record
> > derived as row type for Aggregate.
> > > It is possible when an aggregate has an empty group key and no aggregate
> > calls.
> > >
> > > So the question is whether an empty record is a legal row type for an
> > aggregation node?
> > >
> > > Below is a reproducer for this problem, just put it at
> > RelFieldTrimmerTest:
> > >
> > >  @Test void test() {
> > >    class ContextImpl implements Context {
> > >      final Object target;
> > >
> > >      ContextImpl(Object target) {
> > >        this.target = Objects.requireNonNull(target, "target");
> > >      }
> > >
> > >      @Override public <T extends Object> @Nullable T unwrap(Class<T>
> > clazz) {
> > >        if (clazz.isInstance(target)) {
> > >          return clazz.cast(target);
> > >        }
> > >        return null;
> > >      }
> > >    }
> > >
> > >    // RelBuilder hides problem when simplifyValues=true, hence we need
> > to disable it
> > >    final RelBuilder builder = RelBuilder.create(config()
> > >        .context(new
> > ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
> > >
> > >    final RelNode root =
> > >        builder.scan("EMP")
> > >            .aggregate(builder.groupKey())
> > >            .filter(builder.literal(false))
> > >            .project(builder.literal(42))
> > >            .build();
> > >
> > >    final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null,
> > builder);
> > >    fieldTrimmer.trim(root); // fails with
> > ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
> > >  }
> > >
> > >
> > > [1]
> > https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
> > >
> > > --
> > > Regards,
> > > Konstantin Orlov
> > >
> > >
> > >
> > >
> >
> >

Re: RelFieldTrimmer throws an exception in certain cases

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hello,

The first operator that comes to mind when we are talking about empty
records is a projection that projects nothing. In theory and in most
database books the projection is allowed to have arity 0 but in practice I
never had to use it; neither aggregate nor any other operator. In fact an
aggregate with empty group keys and aggregate calls resembles a lot an
empty projection.

@Konstantin how did you come with the empty aggregate? Was it also a result
of trimming?

If we have to make a decision now I would prefer to disallow them globally
unless somebody comes up with a compelling use-case.
In order to express other query languages (e.g,, datalog, conjunctive
queries, etc) with the algebraic operators we may need 0-arity operators
but I don't have an example readily in mind.

Best,
Stamatis

On Thu, Apr 29, 2021 at 7:07 PM Julian Hyde <jh...@gmail.com> wrote:

> > So the question is whether an empty record is a legal row type for an
> aggregation node?
>
> As that comment indicates, we have tried to avoid empty records — that is,
> a relational expression that produces a row type with zero fields — but as
> you have just discovered, we have failed to go all the way.
>
> Mathematically, it is purer to allow empty records. SQL does not allow
> them, they crop up naturally in quite a lot of corner cases, especially
> after trimming fields.
>
> Pragmatically, I assumed that quite a lot of code was making the
> assumption that records were not empty. And that empty records are
> sufficiently rare that we would never be able to find all of those places
> via testing.
>
> Is it time to decide? If we allow empty records, we should test that all
> relational operators can handle them. If we ban them, then we should (say)
> throw whenever someone registers a RelNode that has an empty row type.
>
> Julian
>
>
>
>
> > On Apr 29, 2021, at 7:25 AM, Konstantin Orlov <ko...@gridgain.com>
> wrote:
> >
> > Hi all.
> >
> > I faced a problem preventing certain queries being planned because
> RelFieldTrimmer throws
> > an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds
> for length 0”.
> >
> > The problem is here [1]:
> >
> >    // If they are asking for no fields, we can't give them what they
> want,
> >    // because zero-column records are illegal. Give them the last field,
> >    // which is unlikely to be a system field.
> >    if (fieldsUsed.isEmpty()) {
> >      fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
> >    }
> >
> > In case fieldsUsed.isEmpty we returns last field, but it is currently
> possible that fieldCount=0 as well.
> >
> > After some investigation I find out that the reason is empty record
> derived as row type for Aggregate.
> > It is possible when an aggregate has an empty group key and no aggregate
> calls.
> >
> > So the question is whether an empty record is a legal row type for an
> aggregation node?
> >
> > Below is a reproducer for this problem, just put it at
> RelFieldTrimmerTest:
> >
> >  @Test void test() {
> >    class ContextImpl implements Context {
> >      final Object target;
> >
> >      ContextImpl(Object target) {
> >        this.target = Objects.requireNonNull(target, "target");
> >      }
> >
> >      @Override public <T extends Object> @Nullable T unwrap(Class<T>
> clazz) {
> >        if (clazz.isInstance(target)) {
> >          return clazz.cast(target);
> >        }
> >        return null;
> >      }
> >    }
> >
> >    // RelBuilder hides problem when simplifyValues=true, hence we need
> to disable it
> >    final RelBuilder builder = RelBuilder.create(config()
> >        .context(new
> ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
> >
> >    final RelNode root =
> >        builder.scan("EMP")
> >            .aggregate(builder.groupKey())
> >            .filter(builder.literal(false))
> >            .project(builder.literal(42))
> >            .build();
> >
> >    final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null,
> builder);
> >    fieldTrimmer.trim(root); // fails with
> ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
> >  }
> >
> >
> > [1]
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
> >
> > --
> > Regards,
> > Konstantin Orlov
> >
> >
> >
> >
>
>

Re: RelFieldTrimmer throws an exception in certain cases

Posted by Julian Hyde <jh...@gmail.com>.
> So the question is whether an empty record is a legal row type for an aggregation node?

As that comment indicates, we have tried to avoid empty records — that is, a relational expression that produces a row type with zero fields — but as you have just discovered, we have failed to go all the way.

Mathematically, it is purer to allow empty records. SQL does not allow them, they crop up naturally in quite a lot of corner cases, especially after trimming fields.

Pragmatically, I assumed that quite a lot of code was making the assumption that records were not empty. And that empty records are sufficiently rare that we would never be able to find all of those places via testing.

Is it time to decide? If we allow empty records, we should test that all relational operators can handle them. If we ban them, then we should (say) throw whenever someone registers a RelNode that has an empty row type.

Julian

 


> On Apr 29, 2021, at 7:25 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
> 
> Hi all.
> 
> I faced a problem preventing certain queries being planned because RelFieldTrimmer throws 
> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds for length 0”.
> 
> The problem is here [1]:
> 
>    // If they are asking for no fields, we can't give them what they want,
>    // because zero-column records are illegal. Give them the last field,
>    // which is unlikely to be a system field.
>    if (fieldsUsed.isEmpty()) {
>      fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>    }
> 
> In case fieldsUsed.isEmpty we returns last field, but it is currently possible that fieldCount=0 as well.  
> 
> After some investigation I find out that the reason is empty record derived as row type for Aggregate.
> It is possible when an aggregate has an empty group key and no aggregate calls.
> 
> So the question is whether an empty record is a legal row type for an aggregation node?
> 
> Below is a reproducer for this problem, just put it at RelFieldTrimmerTest:
> 
>  @Test void test() {
>    class ContextImpl implements Context {
>      final Object target;
> 
>      ContextImpl(Object target) {
>        this.target = Objects.requireNonNull(target, "target");
>      }
> 
>      @Override public <T extends Object> @Nullable T unwrap(Class<T> clazz) {
>        if (clazz.isInstance(target)) {
>          return clazz.cast(target);
>        }
>        return null;
>      }
>    }
> 
>    // RelBuilder hides problem when simplifyValues=true, hence we need to disable it
>    final RelBuilder builder = RelBuilder.create(config()
>        .context(new ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
> 
>    final RelNode root =
>        builder.scan("EMP")
>            .aggregate(builder.groupKey())
>            .filter(builder.literal(false))
>            .project(builder.literal(42))
>            .build();
> 
>    final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null, builder);
>    fieldTrimmer.trim(root); // fails with ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
>  }
> 
> 
> [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
> 
> -- 
> Regards,
> Konstantin Orlov
> 
> 
> 
>