You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Amir Michail <am...@gmail.com> on 2008/10/20 00:15:50 UTC

How to efficiently check for existence of a row?

Hi,

Given some condition, how do you efficiently check whether there is at
least one row that satisfies that condition?

count(*) is slow.

Amir

-- 
http://chatbotgame.com
http://numbrosia.com
http://twitter.com/amichail

Re: How to efficiently check for existence of a row?

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Amir Michail <am...@gmail.com> writes:

> Hi,
>
> Given some condition, how do you efficiently check whether there is at
> least one row that satisfies that condition?
>
> count(*) is slow.

To check whether there are any rows in the table T where X < 4, you
could do something like this:

  VALUES 1 INTERSECT SELECT 1 FROM T WHERE X < 4
  (returns one row if there's a match, zero rows otherwise)

This should stop immediately when it finds a matching row. If there are
no matching rows, the entire table will be scanned unless you have an
index on the column X.

-- 
Knut Anders

RE: How to efficiently check for existence of a row?

Posted by Derby Discussion <de...@db.apache.org>.


> -----Original Message-----
> From: Johnny Kewl [mailto:john@kewlstuff.co.za]
> Sent: Monday, October 20, 2008 2:15 AM
> To: Derby Discussion
> Subject: Re: How to efficiently check for existence of a row?
> 
> 
> ----- Original Message -----
> From: "Amir Michail" <am...@gmail.com>
> To: "Derby Discussion" <de...@db.apache.org>
> Sent: Monday, October 20, 2008 12:15 AM
> Subject: How to efficiently check for existence of a row?
> 
> 
> > Hi,
> >
> > Given some condition, how do you efficiently check whether there is at
> > least one row that satisfies that condition?
> >
> > count(*) is slow.
> >
> Hi Amir
> 
> Looking at your posts, you seem to be relying on stat (aggregate
> functions)
> for everything... think about normal SQL like this...
> 
> String sSql = "SELECT id FROM " + dbTable + " ORDER BY id DESC";
> 
>                   Statement stmt= con.createStatement();
>                   stmt.setMaxRows(1);
>                   ResultSet resultSet = stmt.executeQuery(sSql);
>                   return resultSet;
> 
> So its the max ID because its sorted and it only returns one record...
> 
> Then also read up on "forward only" recordsets... normally the default and
> are very fast.
> And you MUST index the stuff... in this case the ID.
> 
The OP asked for efficient check for existence of a row (1 or more may
qualify).

The fastest solution is to do a simple query and check rs.next().
No need to do an ORDER BY clause because you don't care what order occurs.

If Derby behaves like Oracle, you'll start to get results as soon as they
are found. (The query continues to pump away at finding the data.) The
upside is that you can continue to process rows as they hit the solution
set.

The downside is that if you're using an ORDER BY CLAUSE or some other
aggregate function, GROUP BY, etc ... you won't get the result set until
it's the end of the query being processed.

So the first rs.next() gets your row and you then go on your way...

I believe that is the fastest to check existence.


HTH

-Mike

> Also read up on dB normalization... if you get that, you'll start building
> fast dB's regardles of size.
> Because you also cant just index every field... that also slows things
> down,
> on entry.
> 
> Try avoid making the dB bring 20 million records into memory and then
> counting the things...
> In the above because its indexed on ID and you bringing back only one
> record... its always fast.
> If the dB is normalized correctly, then what you will find is that WHEREs
> tend to act on indexed data.
> 
> .... is what I'm trying to show you
> 
> Its always possible to make terrible designs no matter what dB engine you
> using... always think about whats happening to memory as the dB gets
> larger...
> For that you have to learn dB normalization and about indexes...
> 
> Good Luck...
> 
> --------------------------------------------------------------------------
> -
> HARBOR : http://www.kewlstuff.co.za/index.htm
> The most powerful application server on earth.
> The only real POJO Application Server.
> See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
> --------------------------------------------------------------------------
> -
> If you cant pay in gold... get lost...
> http://coolharbor.100free.com/debt/usadebt.htm
>

Re: How to efficiently check for existence of a row?

Posted by Johnny Kewl <jo...@kewlstuff.co.za>.

----- Original Message ----- 
From: "Amir Michail" <am...@gmail.com>
To: "Derby Discussion" <de...@db.apache.org>
Sent: Monday, October 20, 2008 12:15 AM
Subject: How to efficiently check for existence of a row?

> Hi,
>
> Given some condition, how do you efficiently check whether there is at
> least one row that satisfies that condition?
>
> count(*) is slow.
>
Hi Amir

Looking at your posts, you seem to be relying on stat (aggregate functions) 
for everything... think about normal SQL like this...

String sSql = "SELECT id FROM " + dbTable + " ORDER BY id DESC";

                  Statement stmt= con.createStatement();
                  stmt.setMaxRows(1);
                  ResultSet resultSet = stmt.executeQuery(sSql);
                  return resultSet;

So its the max ID because its sorted and it only returns one record...

Then also read up on "forward only" recordsets... normally the default and 
are very fast.
And you MUST index the stuff... in this case the ID.

Also read up on dB normalization... if you get that, you'll start building 
fast dB's regardles of size.
Because you also cant just index every field... that also slows things down, 
on entry.

Try avoid making the dB bring 20 million records into memory and then 
counting the things...
In the above because its indexed on ID and you bringing back only one 
record... its always fast.
If the dB is normalized correctly, then what you will find is that WHEREs 
tend to act on indexed data.

.... is what I'm trying to show you

Its always possible to make terrible designs no matter what dB engine you 
using... always think about whats happening to memory as the dB gets 
larger...
For that you have to learn dB normalization and about indexes...

Good Luck...

---------------------------------------------------------------------------
HARBOR : http://www.kewlstuff.co.za/index.htm
The most powerful application server on earth.
The only real POJO Application Server.
See it in Action : http://www.kewlstuff.co.za/cd_tut_swf/whatisejb1.htm
---------------------------------------------------------------------------
If you cant pay in gold... get lost...
http://coolharbor.100free.com/debt/usadebt.htm

Re: How to efficiently check for existence of a row?

Posted by Daniel Noll <da...@nuix.com>.

Amir Michail wrote:
> Hi,
> 
> Given some condition, how do you efficiently check whether there is at
> least one row that satisfies that condition?
> 
> count(*) is slow.

I too wish that count(*) were fast, at least for queries which are only 
using indexes. :-(

What about just doing the query and checking rs.next()?  setMaxRows(1) 
should theoretically speed it up, but I haven't noticed performance 
issues which would justify setting it.

Daniel

-- 
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software