You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Damian Carey <ja...@gmail.com> on 2010/06/12 07:58:25 UTC

When is multi field index preferable over single field ?

Hello,

We have a simple database, only 3 tables with a few fields in each)
that contains data supplied by a third party. Almost entirely read
only.

The table of concern has about 380K rows, and on each retrieval we
will retrieve between 10 and 500 rows.

I am looking for some BASIC advice on which indexes should be created
to assist with standard queries.

I am just looking for reasonable advice, and am not worried about any
super optimal solution.

The two basic queries are as follows ...

SELECT * FROM MYTABLE qt WHERE qt.scheduleId = 'ABC' AND qt.body = '4DR'

and

SELECT DISTINCT qt.body FROM MYTABLE qt WHERE qt.scheduleId = 'ABC'

Are "multi field" indexes preferable or would single field indexes work OK.

Perhaps something like this for a multi field ....

CREATE INDEX T_SCHED_BODY_IDX ON MYTABLE (scheduleId, body)

Or perhaps something like this for a single field ....
CREATE INDEX T_SCHED_IDX ON MYTABLE (scheduleId)
CREATE INDEX T_BODY_IDX ON MYTABLE (body)

Any guidance, advice, clues, suggestions, experience would be most appreciated.

Many thanks,
-Damian

Re: When is multi field index preferable over single field ?

Posted by Damian Carey <ja...@gmail.com>.
Knut,

Awesome!

All sort of really obvious once I read your response.  Thanks for
providing "what" and "why". That has really helped my understanding of
indexes.

Many thanks,
-Damian


On Sun, Jun 13, 2010 at 9:24 PM, Knut Anders Hatlen <Kn...@sun.com> wrote:
> On 06/12/10 07:58 AM, Damian Carey wrote:
>> Hello,
>>
>> We have a simple database, only 3 tables with a few fields in each)
>> that contains data supplied by a third party. Almost entirely read
>> only.
>>
>> The table of concern has about 380K rows, and on each retrieval we
>> will retrieve between 10 and 500 rows.
>>
>> I am looking for some BASIC advice on which indexes should be created
>> to assist with standard queries.
>>
>> I am just looking for reasonable advice, and am not worried about any
>> super optimal solution.
>>
>> The two basic queries are as follows ...
>>
>> SELECT * FROM MYTABLE qt WHERE qt.scheduleId = 'ABC' AND qt.body = '4DR'
>>
>> and
>>
>> SELECT DISTINCT qt.body FROM MYTABLE qt WHERE qt.scheduleId = 'ABC'
>>
>> Are "multi field" indexes preferable or would single field indexes work OK.
>>
>> Perhaps something like this for a multi field ....
>>
>> CREATE INDEX T_SCHED_BODY_IDX ON MYTABLE (scheduleId, body)
>>
>> Or perhaps something like this for a single field ....
>> CREATE INDEX T_SCHED_IDX ON MYTABLE (scheduleId)
>> CREATE INDEX T_BODY_IDX ON MYTABLE (body)
>>
>> Any guidance, advice, clues, suggestions, experience would be most appreciated.
>>
>
> Hi Damian,
>
> I think the multi-field index would be preferable in both of these cases.
>
> 1) SELECT * FROM MYTABLE qt WHERE qt.scheduleId = 'ABC' AND qt.body = '4DR'
>
>
> Here, if we only have the two single-field indexes, the query will only
> use one of them. That means (if the first index is picked) that the
> query will have to look at all the rows that have scheduleId='ABC' and
> check if they also have body='4DR'. If many of those rows don't have
> body='4DR', we have wasted time looking at rows that shouldn't be
> included in the result.
>
> With the multi-field index, we'll only have to look at the rows that
> have both scheduleId='ABC' and body='4DR', so we don't waste time
> looking at rows that don't qualify.
>
> 2) SELECT DISTINCT qt.body FROM MYTABLE qt WHERE qt.scheduleId = 'ABC'
>
>
> In this query, a single-field index on scheduleId and a multi-field
> index with scheduleId as the first field will end up looking at the same
> number of rows. However, the multi-field index gives these additional
> benefits:
>
>    * Since the index also contains the field body, all the data needed
>      to produce the result can be found in the index. This saves one
>      lookup in the base table per returned row.
>    * The multi-field index will return the rows ordered on body, so we
>      don't need to sort the result or build a hash table in order to
>      eliminate duplicates as required by the DISTINCT keyword.
>
>
> Hope this helps,
>
> --
> Knut Anders
>
>

Re: When is multi field index preferable over single field ?

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
On 06/12/10 07:58 AM, Damian Carey wrote:
> Hello,
>
> We have a simple database, only 3 tables with a few fields in each)
> that contains data supplied by a third party. Almost entirely read
> only.
>
> The table of concern has about 380K rows, and on each retrieval we
> will retrieve between 10 and 500 rows.
>
> I am looking for some BASIC advice on which indexes should be created
> to assist with standard queries.
>
> I am just looking for reasonable advice, and am not worried about any
> super optimal solution.
>
> The two basic queries are as follows ...
>
> SELECT * FROM MYTABLE qt WHERE qt.scheduleId = 'ABC' AND qt.body = '4DR'
>
> and
>
> SELECT DISTINCT qt.body FROM MYTABLE qt WHERE qt.scheduleId = 'ABC'
>
> Are "multi field" indexes preferable or would single field indexes work OK.
>
> Perhaps something like this for a multi field ....
>
> CREATE INDEX T_SCHED_BODY_IDX ON MYTABLE (scheduleId, body)
>
> Or perhaps something like this for a single field ....
> CREATE INDEX T_SCHED_IDX ON MYTABLE (scheduleId)
> CREATE INDEX T_BODY_IDX ON MYTABLE (body)
>
> Any guidance, advice, clues, suggestions, experience would be most appreciated.
>   

Hi Damian,

I think the multi-field index would be preferable in both of these cases.

1) SELECT * FROM MYTABLE qt WHERE qt.scheduleId = 'ABC' AND qt.body = '4DR'


Here, if we only have the two single-field indexes, the query will only
use one of them. That means (if the first index is picked) that the
query will have to look at all the rows that have scheduleId='ABC' and
check if they also have body='4DR'. If many of those rows don't have
body='4DR', we have wasted time looking at rows that shouldn't be
included in the result.

With the multi-field index, we'll only have to look at the rows that
have both scheduleId='ABC' and body='4DR', so we don't waste time
looking at rows that don't qualify.

2) SELECT DISTINCT qt.body FROM MYTABLE qt WHERE qt.scheduleId = 'ABC'


In this query, a single-field index on scheduleId and a multi-field
index with scheduleId as the first field will end up looking at the same
number of rows. However, the multi-field index gives these additional
benefits:

    * Since the index also contains the field body, all the data needed
      to produce the result can be found in the index. This saves one
      lookup in the base table per returned row.
    * The multi-field index will return the rows ordered on body, so we
      don't need to sort the result or build a hash table in order to
      eliminate duplicates as required by the DISTINCT keyword.


Hope this helps,

-- 
Knut Anders