You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Jeffrey Lichtman <sw...@rcn.com> on 2005/11/02 20:46:30 UTC

Re: Derby, SQL 2003, UNIQUE constraints, unique indices and NULL (long)

>For those wondering how unique indices with multiple null values may
>be implemented, it can be done pretty simple: The columns with null
>values are not part of the index. This will if course affect the
>execution plan.
>
>Example:
>
>CREATE TABLE t (i INTEGER, a INTEGER NOT NULL, b INTEGER);
>CREATE UNIQUE INDEX idx1 ON t(i);
>CREATE UNIQUE INDEX idx2 ON t(a, b);

Here are some more examples, assuming that nulls are not stored in indexes:

CREATE TABLE t2 (i INTEGER, j INTEGER NOT NULL);
CREATE INDEX idx3 on t2(i);
CREATE UNIQUE INDEX idx4 on t2(j);

SELECT i FROM t2;  -- use index idx3 (covers query)
SELECT j FROM t2;  -- Do a table scan


>There is of course are other ways of implementing unique indices with
>multiple null values.
>--
>Bernt Marius Johnsen, Database Technology Group,

Yes. The comparison operations for datatypes are defined in the interface:

org.apache.derby.iapi.types.DataValueDescriptor

This interface defines two comparison methods:


         /**
          * Compare this Orderable with a given Orderable for the purpose of
          * index positioning.  This method treats nulls as ordered values -
          * that is, it treats SQL null as equal to null and less than all
          * other values.
          *
          * @param other         The Orderable to compare this one to.
          *
          * @return  <0 - this Orderable is less than other.
          *                       0 - this Orderable equals other.
          *                      >0 - this Orderable is greater than other.
      *
      *                  The code should not explicitly look for -1, or 1.
          *
          * @exception StandardException         Thrown on error
          */
         int compare(DataValueDescriptor other) throws StandardException;

         /**
          * Compare this Orderable with a given Orderable for the purpose of
          * qualification and sorting.  The caller gets to determine how nulls
          * should be treated - they can either be ordered values or unknown
          * values.
          *
          * @param op    Orderable.ORDER_OP_EQUALS means do an = comparison.
          *                              Orderable.ORDER_OP_LESSTHAN 
means compare this < other.
          * 
Orderable.ORDER_OP_LESSOREQUALS means compare this <= other.
          * @param other The DataValueDescriptor to compare this one to.
          * @param orderedNulls  True means to treat nulls as ordered values,
          *                                              that is, 
treat SQL null as equal to null, and less
          *                                              than all other values.
          *                                              False means 
to treat nulls as unknown values,
          *                                              that is, the 
result of any comparison with a null
          *                                              is the 
UNKNOWN truth value.
          * @param unknownRV             The return value to use if 
the result of the
          *                                              comparison 
is the UNKNOWN truth value.  In other
          *                                              words, if 
orderedNulls is false, and a null is
          *                                              involved in 
the comparison, return unknownRV.
          *                                              This 
parameter is not used orderedNulls is true.
          *
          * @return      true if the comparison is true (duh!)
          *
          * @exception StandardException         Thrown on error
          */
         boolean compare(
     int         op,
     DataValueDescriptor   other,
     boolean     orderedNulls,
     boolean     unknownRV)
                                 throws StandardException;

I don't know which compare method the store uses to determine 
uniqueness of a data value for a unique index. I suspect it either 
uses the first, or it uses the second with orderedNulls equal to 
TRUE. In any case, the interface already supports a technique to 
allow multiple nulls in a unique index. There may be some reason the 
store can't use this technique, though.


                        -        Jeff Lichtman
                                 swazoo@rcn.com
                                 Check out Swazoo Koolak's Web Jukebox at
                                 http://swazoo.com/

Re: Derby, SQL 2003, UNIQUE constraints, unique indices and NULL (long)

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Jeffrey Lichtman wrote:
> 
>> > CREATE TABLE t2 (i INTEGER, j INTEGER NOT NULL);
>> > CREATE INDEX idx3 on t2(i);
>> > CREATE UNIQUE INDEX idx4 on t2(j);
>> >
>> > SELECT i FROM t2;  -- use index idx3 (covers query)
>> > SELECT j FROM t2;  -- Do a table scan
>>
>> Shouldn't the last SELECT use idx4 since the index is UNIQUE *and* j
>> is NOT NULL?
>> -- 
>> Bernt Marius Johnsen, Database Technology Group,
> 
> 
> Yes, you're right. I should have said:
> 
> CREATE TABLE t2 (i INTEGER, j INTEGER);
> CREATE INDEX idx3 on t2(i);
> CREATE UNIQUE INDEX idx4 on t2(j);
> 
> SELECT i FROM t2;  -- use index idx3 (covers query)
> SELECT j FROM t2;  -- Do a table scan
> 
> I've done a little more thinking about this problem. I suspect that the 
> store code uses the results of the same compare() method invocation to 
> determine where to insert a row into an index and also to determine 
> whether the row is a duplicate in a unique index. If Derby is to allow 
> multiple nulls in a unique index, the store must use the two different 
> null semantics for the two cases. For the question of where to insert 
> the row, nulls must be considered to be equal to each other, and for the 
> question of whether they are duplicates, they must be considered not 
> equal to each other.
> 
> If I'm right, it should be easy to change Derby to allow multiple nulls 
> in unique indexes. We still haven't heard from any of the store experts, 
> though. Mike?

I haven't thought this one all the way through, but thought I would
respond with at least what I know - I have been reticent to weigh
in on this uniqueness issue given the flack being thrown about.  So
I asked for some patience if I say something stupid below.

When I last looked at this issue
I looked at special casing columns with null's in the unique index and
that turned into a rat hole where all unique key index lookups would
go slower to support the multiple null case.

The current derby index implementation assumes all rows in the index are
unique and have a deterministic ordering, where a row in the index is 
all the key columns plus the the row location column which points to the
row in the base table.  So for "non-unique" indexes searching is always
done using all columns plus the row location column.  This basic 
assumption is used for searching, branch key specification, logical
undo recovery, and serializable locking considerations.  So from store
point of view during searching null's must compare equal.

For unique indexes searching is done using only the keys, and not the
the row location column.  In this way when a search is done to insert
a row into the unique index, it searches just for the keys (not 
including the unique row location) and if it
finds an existing non-deleted match it throws a duplicate key exception.

As jeff points out there may be a way to do the search using one compare
and the duplicate check using another, but no matter what from the store
point of view the null's are the same (yes I know SQL does not agree).
Recovery on a unique key index expects that searching for a key will
find ONE entry for the key and it will be the right one.  There are also
locking, searching and cost assumptions/optimizations that assume if you 
are looking for a key in the tree you will only get 1 row back.

So I think if it were decided to allow multiple null's in a constraint
or an index, then the actual underlying btree implementation to use 
would be the non-unique index implementation with some extra code
stuck in to do the datatype specific comparison necessary at insert 
time.  I think that code would look something like:
     o position beginning of scan using insert key, using current 
compare semantics.
     o scan all rows which match the key using current semantics
     o for each row use alternate compare semantics and if alternate
       compare says equal disallow insert.

This could be
done today with no store changes by having the execution engine first
do a serializable lookup on the row to be inserted and then inserting
if the row met SQL defined uniqueness definitions - but implementing
in store would be more efficient as at least it would only do one
search vs. 2.

I would worry about an implementation like the one Bernt suggests, as it
just seems to invite weird bugs when using the wrong index; but this
may just reflect my inexperience with the optimizer and execution engine.

The other day I heard a good use for the SQL constraint null semantics.
Imagine an application that tracked people by SS# but also included
people with no SS#.  In that case the SQL constraint would insure
uniqueness of the SS# while allowing null's for those with no SS#.

> 
>                        -        Jeff Lichtman
>                                 swazoo@rcn.com
>                                 Check out Swazoo Koolak's Web Jukebox at
>                                 http://swazoo.com/
> 
>

Re: Derby, SQL 2003, UNIQUE constraints, unique indices and NULL (long)

Posted by Jeffrey Lichtman <sw...@rcn.com>.

> > CREATE TABLE t2 (i INTEGER, j INTEGER NOT NULL);
> > CREATE INDEX idx3 on t2(i);
> > CREATE UNIQUE INDEX idx4 on t2(j);
> >
> > SELECT i FROM t2;  -- use index idx3 (covers query)
> > SELECT j FROM t2;  -- Do a table scan
>
>Shouldn't the last SELECT use idx4 since the index is UNIQUE *and* j
>is NOT NULL?
>--
>Bernt Marius Johnsen, Database Technology Group,

Yes, you're right. I should have said:

CREATE TABLE t2 (i INTEGER, j INTEGER);
CREATE INDEX idx3 on t2(i);
CREATE UNIQUE INDEX idx4 on t2(j);

SELECT i FROM t2;  -- use index idx3 (covers query)
SELECT j FROM t2;  -- Do a table scan

I've done a little more thinking about this problem. I suspect that 
the store code uses the results of the same compare() method 
invocation to determine where to insert a row into an index and also 
to determine whether the row is a duplicate in a unique index. If 
Derby is to allow multiple nulls in a unique index, the store must 
use the two different null semantics for the two cases. For the 
question of where to insert the row, nulls must be considered to be 
equal to each other, and for the question of whether they are 
duplicates, they must be considered not equal to each other.

If I'm right, it should be easy to change Derby to allow multiple 
nulls in unique indexes. We still haven't heard from any of the store 
experts, though. Mike?

                        -        Jeff Lichtman
                                 swazoo@rcn.com
                                 Check out Swazoo Koolak's Web Jukebox at
                                 http://swazoo.com/

Re: Derby, SQL 2003, UNIQUE constraints, unique indices and NULL (long)

Posted by "Bernt M. Johnsen" <Be...@Sun.COM>.

>>>>>>>>>>>> Jeffrey Lichtman wrote (2005-11-02 11:46:30):
> 
> Here are some more examples, assuming that nulls are not stored in indexes:
> 
> CREATE TABLE t2 (i INTEGER, j INTEGER NOT NULL);
> CREATE INDEX idx3 on t2(i);
> CREATE UNIQUE INDEX idx4 on t2(j);
> 
> SELECT i FROM t2;  -- use index idx3 (covers query)
> SELECT j FROM t2;  -- Do a table scan

Shouldn't the last SELECT use idx4 since the index is UNIQUE *and* j
is NOT NULL?

-- 
Bernt Marius Johnsen, Database Technology Group, 
Sun Microsystems, Trondheim, Norway