You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Li Yang <li...@apache.org> on 2015/11/13 04:30:21 UTC

About NULL's position in order by

Encountered a seemingly inconsistent behavior about NULL in order by. Could
anyone shed some light? (calcite 1.4 release)

When a column contains NULL value and order by it desc, the NULL is
positioned at the end of result.

E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC"  returns

2013-08-11 US ....
2013-08-18 FR  ....
2013-03-31 null ....
2013-11-10 null ....

However if order by one more column, the NULL will come to the top this
time.

E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC, CAL_DT
ASC"  returns

2013-03-31 null ....
2013-11-10 null ....
2013-08-11 US ....
2013-08-18 FR  ....

I understand NULL's order comparing to other values is vague in SQL, either
putting it first or last is fine. However do expect the order is
consistent. From a user point of view, if it seems NULL is the smallest,
then it should always be so.

Thanks

Re: About NULL's position in order by

Posted by Julian Hyde <jh...@apache.org>.
I have logged https://issues.apache.org/jira/browse/CALCITE-981 to track this.

> On Nov 18, 2015, at 10:28 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> 
>> On Nov 16, 2015, at 6:12 PM, Jinfeng Ni <ji...@gmail.com> wrote:
>> 
>> The only concern I have is about JDBC data source that Drill is
>> pushing to.  If we get the sort order fully specified in planning
>> time, then the same query might see different results when it goes
>> through the native engine (which uses a different sort order), or it
>> goes through Drill.
> 
> Agreed. Some work on the JDBC adapter will be needed. If the JDBC source supports the desired null direction (e.g. has NULLS FIRST/LAST syntax) we should generate it. If the column is NOT NULL, it’s not a problem. With Mondrian we ended up generating "ORDER BY ISNULL(x), x” on MySQL[1].
> 
> Julian
> 
> [1] http://jira.pentaho.com/browse/MONDRIAN-226
> 


Re: About NULL's position in order by

Posted by Julian Hyde <jh...@apache.org>.
> On Nov 16, 2015, at 6:12 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> 
> The only concern I have is about JDBC data source that Drill is
> pushing to.  If we get the sort order fully specified in planning
> time, then the same query might see different results when it goes
> through the native engine (which uses a different sort order), or it
> goes through Drill.

Agreed. Some work on the JDBC adapter will be needed. If the JDBC source supports the desired null direction (e.g. has NULLS FIRST/LAST syntax) we should generate it. If the column is NOT NULL, it’s not a problem. With Mondrian we ended up generating "ORDER BY ISNULL(x), x” on MySQL[1].

Julian

[1] http://jira.pentaho.com/browse/MONDRIAN-226


Re: About NULL's position in order by

Posted by Jinfeng Ni <ji...@gmail.com>.
It makes sense to me to get everything fully specified at planning
time. I agree with you that the proposed change would not impact the
Drill's execution, since Drill currently already uses "nulls high".

The only concern I have is about JDBC data source that Drill is
pushing to.  If we get the sort order fully specified in planning
time, then the same query might see different results when it goes
through the native engine (which uses a different sort order), or it
goes through Drill.

On Mon, Nov 16, 2015 at 2:38 PM, Julian Hyde <jh...@apache.org> wrote:
> Currently the null direction is resolved at execution time. As you noted, you get an UNSPECIFIED in LogicalSortRel.
>
> After 970, the null direction is resolved at sql-to-rel time. This would solve several problems, including the problem that different engines (say Drill and a JDBC data source that Drill is pushing to) might have a different sort order. In my opinion it is better if, at planning time, everything is fully specified.
>
> It sounds as if Drill is already using the proposed “nulls high” policy, so I don’t think there would be any change in the behavior as seen by end users.
>
> I tried to minimize changes to EXPLAIN. I changed the RelFieldCollation.toString() method (and some other places) so that “1 asc nulls last” prints as “1” (previously it was “1 asc unspecified” that printed as “1”). You should not have to make many changes to expected test output, even though the contents of RelFieldCollation have changed.
>
> Julian
>
>
>> On Nov 16, 2015, at 1:39 PM, Jinfeng Ni <ji...@gmail.com> wrote:
>>
>> Julian,
>>
>> I'm trying to see if CALCITE-969/970 may have any impact on Drilll side.
>>
>> I assume that the following logic happens in Calcite's execution side, right?
>> " Currently
>>
>>  ORDER BY x DESC
>>
>> is equivalent to
>>
>>  ORDER BY x DESC NULLS LAST
>> "
>>
>> On 1.4.0,  for "ORDER BY x DECS", I saw the nullDirection is
>> "UNSPECIFIED" in LogicalSortRel. I did not see "NULLS LAST" added to
>> "ORDER BY x DESC" in Calcite logical world.
>>
>> Drill's execution would interpret "UNSPECIFIED" as "NULL FIRST for
>> DESC" and "NULL last for ASC" [1].  So, I assume the thing you talked
>> above is also happening in Calcite's execution time. That's, you will
>> still keep "UNSPECIFIED" in LogicalSort, if user does not specify
>> nullDirection, and leave each different execution engine to choose
>> different interpretation.
>>
>> Thanks,
>>
>> Jinfeng
>>
>>
>> [1]. https://github.com/apache/drill/blob/master/logical/src/main/java/org/apache/drill/common/logical/data/Order.java#L195-L202
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 16, 2015 at 11:50 AM, Julian Hyde <jh...@apache.org> wrote:
>>>
>>>
>>>> On Nov 15, 2015, at 6:50 PM, Li Yang <li...@apache.org> wrote:
>>>>
>>>> https://issues.apache.org/jira/browse/CALCITE-969
>>>
>>> Li Yang, Thanks for logging that.
>>>
>>> I realized that we have two issues, so I logged https://issues.apache.org/jira/browse/CALCITE-970 for the other part. 970 makes NULL values sort HIGH, by default, which is consistent with Oracle.
>>>
>>> But it will change some SQL behavior. Currently
>>>
>>>  ORDER BY x DESC
>>>
>>> is equivalent to
>>>
>>>  ORDER BY x DESC NULLS LAST
>>>
>>> but after 970 it will be equivalent to
>>>
>>>  ORDER BY x DESC NULLS FIRST
>>>
>>> If you like the current behavior, you can set defaultNullCollation=LAST. Also, I tightened up what happens in RelNode land; we now use NullDirection.UNSPECIFIED a lot less than we used to.
>>>
>>> Please holler if you don’t like 970.
>>>
>>> Julian
>>>
>

Re: About NULL's position in order by

Posted by Julian Hyde <jh...@apache.org>.
Currently the null direction is resolved at execution time. As you noted, you get an UNSPECIFIED in LogicalSortRel.

After 970, the null direction is resolved at sql-to-rel time. This would solve several problems, including the problem that different engines (say Drill and a JDBC data source that Drill is pushing to) might have a different sort order. In my opinion it is better if, at planning time, everything is fully specified.

It sounds as if Drill is already using the proposed “nulls high” policy, so I don’t think there would be any change in the behavior as seen by end users.

I tried to minimize changes to EXPLAIN. I changed the RelFieldCollation.toString() method (and some other places) so that “1 asc nulls last” prints as “1” (previously it was “1 asc unspecified” that printed as “1”). You should not have to make many changes to expected test output, even though the contents of RelFieldCollation have changed.

Julian


> On Nov 16, 2015, at 1:39 PM, Jinfeng Ni <ji...@gmail.com> wrote:
> 
> Julian,
> 
> I'm trying to see if CALCITE-969/970 may have any impact on Drilll side.
> 
> I assume that the following logic happens in Calcite's execution side, right?
> " Currently
> 
>  ORDER BY x DESC
> 
> is equivalent to
> 
>  ORDER BY x DESC NULLS LAST
> "
> 
> On 1.4.0,  for "ORDER BY x DECS", I saw the nullDirection is
> "UNSPECIFIED" in LogicalSortRel. I did not see "NULLS LAST" added to
> "ORDER BY x DESC" in Calcite logical world.
> 
> Drill's execution would interpret "UNSPECIFIED" as "NULL FIRST for
> DESC" and "NULL last for ASC" [1].  So, I assume the thing you talked
> above is also happening in Calcite's execution time. That's, you will
> still keep "UNSPECIFIED" in LogicalSort, if user does not specify
> nullDirection, and leave each different execution engine to choose
> different interpretation.
> 
> Thanks,
> 
> Jinfeng
> 
> 
> [1]. https://github.com/apache/drill/blob/master/logical/src/main/java/org/apache/drill/common/logical/data/Order.java#L195-L202
> 
> 
> 
> 
> 
> 
> 
> On Mon, Nov 16, 2015 at 11:50 AM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> 
>>> On Nov 15, 2015, at 6:50 PM, Li Yang <li...@apache.org> wrote:
>>> 
>>> https://issues.apache.org/jira/browse/CALCITE-969
>> 
>> Li Yang, Thanks for logging that.
>> 
>> I realized that we have two issues, so I logged https://issues.apache.org/jira/browse/CALCITE-970 for the other part. 970 makes NULL values sort HIGH, by default, which is consistent with Oracle.
>> 
>> But it will change some SQL behavior. Currently
>> 
>>  ORDER BY x DESC
>> 
>> is equivalent to
>> 
>>  ORDER BY x DESC NULLS LAST
>> 
>> but after 970 it will be equivalent to
>> 
>>  ORDER BY x DESC NULLS FIRST
>> 
>> If you like the current behavior, you can set defaultNullCollation=LAST. Also, I tightened up what happens in RelNode land; we now use NullDirection.UNSPECIFIED a lot less than we used to.
>> 
>> Please holler if you don’t like 970.
>> 
>> Julian
>> 


Re: About NULL's position in order by

Posted by Jinfeng Ni <ji...@gmail.com>.
Julian,

I'm trying to see if CALCITE-969/970 may have any impact on Drilll side.

I assume that the following logic happens in Calcite's execution side, right?
" Currently

  ORDER BY x DESC

is equivalent to

  ORDER BY x DESC NULLS LAST
"

On 1.4.0,  for "ORDER BY x DECS", I saw the nullDirection is
"UNSPECIFIED" in LogicalSortRel. I did not see "NULLS LAST" added to
"ORDER BY x DESC" in Calcite logical world.

Drill's execution would interpret "UNSPECIFIED" as "NULL FIRST for
DESC" and "NULL last for ASC" [1].  So, I assume the thing you talked
above is also happening in Calcite's execution time. That's, you will
still keep "UNSPECIFIED" in LogicalSort, if user does not specify
nullDirection, and leave each different execution engine to choose
different interpretation.

Thanks,

Jinfeng


[1]. https://github.com/apache/drill/blob/master/logical/src/main/java/org/apache/drill/common/logical/data/Order.java#L195-L202







On Mon, Nov 16, 2015 at 11:50 AM, Julian Hyde <jh...@apache.org> wrote:
>
>
>> On Nov 15, 2015, at 6:50 PM, Li Yang <li...@apache.org> wrote:
>>
>> https://issues.apache.org/jira/browse/CALCITE-969
>
> Li Yang, Thanks for logging that.
>
> I realized that we have two issues, so I logged https://issues.apache.org/jira/browse/CALCITE-970 for the other part. 970 makes NULL values sort HIGH, by default, which is consistent with Oracle.
>
> But it will change some SQL behavior. Currently
>
>   ORDER BY x DESC
>
> is equivalent to
>
>   ORDER BY x DESC NULLS LAST
>
> but after 970 it will be equivalent to
>
>   ORDER BY x DESC NULLS FIRST
>
> If you like the current behavior, you can set defaultNullCollation=LAST. Also, I tightened up what happens in RelNode land; we now use NullDirection.UNSPECIFIED a lot less than we used to.
>
> Please holler if you don’t like 970.
>
> Julian
>

Re: About NULL's position in order by

Posted by Julian Hyde <jh...@apache.org>.

> On Nov 15, 2015, at 6:50 PM, Li Yang <li...@apache.org> wrote:
> 
> https://issues.apache.org/jira/browse/CALCITE-969

Li Yang, Thanks for logging that.

I realized that we have two issues, so I logged https://issues.apache.org/jira/browse/CALCITE-970 for the other part. 970 makes NULL values sort HIGH, by default, which is consistent with Oracle.

But it will change some SQL behavior. Currently

  ORDER BY x DESC

is equivalent to

  ORDER BY x DESC NULLS LAST

but after 970 it will be equivalent to

  ORDER BY x DESC NULLS FIRST

If you like the current behavior, you can set defaultNullCollation=LAST. Also, I tightened up what happens in RelNode land; we now use NullDirection.UNSPECIFIED a lot less than we used to.

Please holler if you don’t like 970.

Julian


Re: About NULL's position in order by

Posted by Li Yang <li...@apache.org>.
https://issues.apache.org/jira/browse/CALCITE-969


On Sat, Nov 14, 2015 at 5:12 AM, Julian Hyde <jh...@apache.org> wrote:

> Can you log a JIRA case please?
>
> For extra credit, the fix would allow the position of nulls to be
> configurable. In CalciteConnectionProperty, add
>
>   enum NullCollation { LOW, HIGH, FIRST, LAST }
>
> and
>
>   DEFAULT_NULL_COLLATION, default HIGH (last for ASC, first for DESC).
>
> The minimum fix is to support HIGH without looking at a config property.
>
> Julian
>
>
> > On Nov 12, 2015, at 9:19 PM, Li Yang <li...@apache.org> wrote:
> >
> > Sadly no.
> >
> > The first example (order by COUNTRY DESC) returns NULL at last, not like
> > Oracle's "NULLS FIRST is the default for descending order".
> >
> > On Fri, Nov 13, 2015 at 11:40 AM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> I don’t know whether we ever spelled out what Calcite’s behavior is. I
> >> don’t think the SQL standards specifies, so let’s look to Oracle. The
> >> Oracle specification is as follows[1]:
> >>
> >>> NULLS LAST is the default for ascending order, and NULLS FIRST is the
> >> default for descending order.
> >>
> >> Is Calcite consistent with that?
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>> On Nov 12, 2015, at 7:30 PM, Li Yang <li...@apache.org> wrote:
> >>>
> >>> Encountered a seemingly inconsistent behavior about NULL in order by.
> >> Could
> >>> anyone shed some light? (calcite 1.4 release)
> >>>
> >>> When a column contains NULL value and order by it desc, the NULL is
> >>> positioned at the end of result.
> >>>
> >>> E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC"
> >> returns
> >>>
> >>> 2013-08-11 US ....
> >>> 2013-08-18 FR  ....
> >>> 2013-03-31 null ....
> >>> 2013-11-10 null ....
> >>>
> >>> However if order by one more column, the NULL will come to the top this
> >>> time.
> >>>
> >>> E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC,
> CAL_DT
> >>> ASC"  returns
> >>>
> >>> 2013-03-31 null ....
> >>> 2013-11-10 null ....
> >>> 2013-08-11 US ....
> >>> 2013-08-18 FR  ....
> >>>
> >>> I understand NULL's order comparing to other values is vague in SQL,
> >> either
> >>> putting it first or last is fine. However do expect the order is
> >>> consistent. From a user point of view, if it seems NULL is the
> smallest,
> >>> then it should always be so.
> >>>
> >>> Thanks
> >>
> >>
>
>

Re: About NULL's position in order by

Posted by Julian Hyde <jh...@apache.org>.
Can you log a JIRA case please?

For extra credit, the fix would allow the position of nulls to be configurable. In CalciteConnectionProperty, add

  enum NullCollation { LOW, HIGH, FIRST, LAST }

and

  DEFAULT_NULL_COLLATION, default HIGH (last for ASC, first for DESC).

The minimum fix is to support HIGH without looking at a config property.

Julian


> On Nov 12, 2015, at 9:19 PM, Li Yang <li...@apache.org> wrote:
> 
> Sadly no.
> 
> The first example (order by COUNTRY DESC) returns NULL at last, not like
> Oracle's "NULLS FIRST is the default for descending order".
> 
> On Fri, Nov 13, 2015 at 11:40 AM, Julian Hyde <jh...@apache.org> wrote:
> 
>> I don’t know whether we ever spelled out what Calcite’s behavior is. I
>> don’t think the SQL standards specifies, so let’s look to Oracle. The
>> Oracle specification is as follows[1]:
>> 
>>> NULLS LAST is the default for ascending order, and NULLS FIRST is the
>> default for descending order.
>> 
>> Is Calcite consistent with that?
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Nov 12, 2015, at 7:30 PM, Li Yang <li...@apache.org> wrote:
>>> 
>>> Encountered a seemingly inconsistent behavior about NULL in order by.
>> Could
>>> anyone shed some light? (calcite 1.4 release)
>>> 
>>> When a column contains NULL value and order by it desc, the NULL is
>>> positioned at the end of result.
>>> 
>>> E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC"
>> returns
>>> 
>>> 2013-08-11 US ....
>>> 2013-08-18 FR  ....
>>> 2013-03-31 null ....
>>> 2013-11-10 null ....
>>> 
>>> However if order by one more column, the NULL will come to the top this
>>> time.
>>> 
>>> E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC, CAL_DT
>>> ASC"  returns
>>> 
>>> 2013-03-31 null ....
>>> 2013-11-10 null ....
>>> 2013-08-11 US ....
>>> 2013-08-18 FR  ....
>>> 
>>> I understand NULL's order comparing to other values is vague in SQL,
>> either
>>> putting it first or last is fine. However do expect the order is
>>> consistent. From a user point of view, if it seems NULL is the smallest,
>>> then it should always be so.
>>> 
>>> Thanks
>> 
>> 


Re: About NULL's position in order by

Posted by Li Yang <li...@apache.org>.
Sadly no.

The first example (order by COUNTRY DESC) returns NULL at last, not like
Oracle's "NULLS FIRST is the default for descending order".

On Fri, Nov 13, 2015 at 11:40 AM, Julian Hyde <jh...@apache.org> wrote:

> I don’t know whether we ever spelled out what Calcite’s behavior is. I
> don’t think the SQL standards specifies, so let’s look to Oracle. The
> Oracle specification is as follows[1]:
>
> > NULLS LAST is the default for ascending order, and NULLS FIRST is the
> default for descending order.
>
> Is Calcite consistent with that?
>
> Julian
>
>
>
>
> > On Nov 12, 2015, at 7:30 PM, Li Yang <li...@apache.org> wrote:
> >
> > Encountered a seemingly inconsistent behavior about NULL in order by.
> Could
> > anyone shed some light? (calcite 1.4 release)
> >
> > When a column contains NULL value and order by it desc, the NULL is
> > positioned at the end of result.
> >
> > E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC"
> returns
> >
> > 2013-08-11 US ....
> > 2013-08-18 FR  ....
> > 2013-03-31 null ....
> > 2013-11-10 null ....
> >
> > However if order by one more column, the NULL will come to the top this
> > time.
> >
> > E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC, CAL_DT
> > ASC"  returns
> >
> > 2013-03-31 null ....
> > 2013-11-10 null ....
> > 2013-08-11 US ....
> > 2013-08-18 FR  ....
> >
> > I understand NULL's order comparing to other values is vague in SQL,
> either
> > putting it first or last is fine. However do expect the order is
> > consistent. From a user point of view, if it seems NULL is the smallest,
> > then it should always be so.
> >
> > Thanks
>
>

Re: About NULL's position in order by

Posted by Julian Hyde <jh...@apache.org>.
I don’t know whether we ever spelled out what Calcite’s behavior is. I don’t think the SQL standards specifies, so let’s look to Oracle. The Oracle specification is as follows[1]:

> NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.

Is Calcite consistent with that?

Julian




> On Nov 12, 2015, at 7:30 PM, Li Yang <li...@apache.org> wrote:
> 
> Encountered a seemingly inconsistent behavior about NULL in order by. Could
> anyone shed some light? (calcite 1.4 release)
> 
> When a column contains NULL value and order by it desc, the NULL is
> positioned at the end of result.
> 
> E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC"  returns
> 
> 2013-08-11 US ....
> 2013-08-18 FR  ....
> 2013-03-31 null ....
> 2013-11-10 null ....
> 
> However if order by one more column, the NULL will come to the top this
> time.
> 
> E.g.  "select cal_dt, country, ... from xxx order by COUNTRY DESC, CAL_DT
> ASC"  returns
> 
> 2013-03-31 null ....
> 2013-11-10 null ....
> 2013-08-11 US ....
> 2013-08-18 FR  ....
> 
> I understand NULL's order comparing to other values is vague in SQL, either
> putting it first or last is fine. However do expect the order is
> consistent. From a user point of view, if it seems NULL is the smallest,
> then it should always be so.
> 
> Thanks