You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user-java@ibatis.apache.org by Zoran Avtarovski <zo...@sparecreative.com> on 2005/06/17 03:46:35 UTC

OT: COUNT query speed issue

I have a quick query regarding the speed of count query I use to generate a
chart, which is too slow at present and I'm looking for suggestions to speed
it up.

I pass a collection of the last 30 days data usage to a simple Flash
charting app and I use the ibatis query below  in a loop thirty times to get
the data for each successive day. The table has about half a million entries
and will only be getting bigger.

SELECT COUNT(*) AS count FROM log  WHERE remote_id = #remoteId# and log_time
BETWEEN #lowDate# AND #highDate#


The complete query (all thirty days) takes over a minute to complete, which
causes timeouts. I did some profiling and the bottle neck is with the above
query. Clearly, I'm doing something wrong. Is there a better formed query
that will improve speed or should I be  looking at a different approach to
this issue?

I know this sounds stupid but I'm looking for pointers for best practice so
I don't have to revisit this issue down the track.


Zoran

Re: OT: COUNT query speed issue

Posted by Zoran Avtarovski <zo...@sparecreative.com>.

OK. So indexing multiple columns is pretty easy and made a huge difference
to speed. As for single transaction, the answer is no. I thought that
waiting on 30 separate transactions would leave the connection open far too
long.

Z.

> Also, do you do this in the scope of a single transaction?
> 
> On 6/16/05, Ron Grabowski <ro...@yahoo.com> wrote:
>> Do things get faster if you have the database index remote_id and
>> log_time?
>>

Re: OT: COUNT query speed issue

Posted by Larry Meadors <la...@gmail.com>.

Also, do you do this in the scope of a single transaction?

On 6/16/05, Ron Grabowski <ro...@yahoo.com> wrote:
> Do things get faster if you have the database index remote_id and
> log_time?
> 
> --- Zoran Avtarovski <zo...@sparecreative.com> wrote:
> 
> > I have a quick query regarding the speed of count query I use to
> > generate a
> > chart, which is too slow at present and I'm looking for suggestions
> > to speed
> > it up.
> >
> > I pass a collection of the last 30 days data usage to a simple Flash
> > charting app and I use the ibatis query below  in a loop thirty times
> > to get
> > the data for each successive day. The table has about half a million
> > entries
> > and will only be getting bigger.
> >
> > SELECT COUNT(*) AS count FROM log  WHERE remote_id = #remoteId# and
> > log_time
> > BETWEEN #lowDate# AND #highDate#
> >
> >
> > The complete query (all thirty days) takes over a minute to complete,
> > which
> > causes timeouts. I did some profiling and the bottle neck is with the
> > above
> > query. Clearly, I'm doing something wrong. Is there a better formed
> > query
> > that will improve speed or should I be  looking at a different
> > approach to
> > this issue?
> >
> > I know this sounds stupid but I'm looking for pointers for best
> > practice so
> > I don't have to revisit this issue down the track.
> >
> >
> > Zoran
> >
> >
> >
> 
>

Re: OT: COUNT query speed issue

Posted by Zoran Avtarovski <zo...@sparecreative.com>.

Thanks for the suggestion Larry,

That's exactly what I was thinking, but I guess I was concerned that it was
overkill. Do you know if there are some example approaches or methodologies
I can read up on.

Z.

> Is this historical data? What I mean by that is, does it change frequently?
> 
> If not, maybe a better approach would be to periodically create a
> table with the data in a summarized form.
> 
>> From the original email, it sounds like this approach may work for
> data where the log_time is before today, but maybe not for today's
> data. If that is the case, you could union the two queries (one over
> the historical data, and one over the live data for today).
> 
> Larry
> 
> On 6/16/05, Zoran Avtarovski <zo...@sparecreative.com> wrote:
>>> Do things get faster if you have the database index remote_id and
>>> log_time?
>>> 
>> The log_time column is indexed, but I couldn't work out how to index two
>> columns in mysql.
>> 
>> Z.
>> 
>> 
>>

Re: OT: COUNT query speed issue

Posted by Larry Meadors <la...@gmail.com>.

Is this historical data? What I mean by that is, does it change frequently? 

If not, maybe a better approach would be to periodically create a
table with the data in a summarized form.

>From the original email, it sounds like this approach may work for
data where the log_time is before today, but maybe not for today's
data. If that is the case, you could union the two queries (one over
the historical data, and one over the live data for today).

Larry

On 6/16/05, Zoran Avtarovski <zo...@sparecreative.com> wrote:
> > Do things get faster if you have the database index remote_id and
> > log_time?
> >
> The log_time column is indexed, but I couldn't work out how to index two
> columns in mysql.
> 
> Z.
> 
> 
>

Re: OT: COUNT query speed issue

Posted by Zoran Avtarovski <zo...@sparecreative.com>.

Thanks Philippe,

I got past that. And in the end I followed a suggestion by Nils and read the
MySQL cookbook by O'Reilly (it's amazing how much a little effort pays off)
to change my Query to:

SELECT DATE_FORMAT(log_time,'%M %d') AS date, COUNT(*) AS count  


FROM  log    

WHERE  remote_id = #remoteId# AND log_time BETWEEN #lowDate# AND #highDate#
    
GROUP BY date


This returns a list of dates and associated counts in one SQL query. This
combined with the indexing has increased speed well beyond my wildest
imagination.

Zoran

>> The log_time column is indexed, but I couldn't work out how to index two
>> columns in mysql.
> 
> Indexing two (or more) columns in MySQL is the same as indexing one:
> 
> CREATE TABLE a_table(
>    remote_id INTEGER UNSIGNED NOT NULL,
>    log_time DATE NOT NULL,
>    INDEX remote_time(remote_id, log_time)
> )
> 
> Simply comma-seperate the column names in your INDEX statement... This
> is the MySQL >4.0 syntax. I don't know about previous versions.
> 
> Cheers,
> Philippe
> 
> Zoran Avtarovski wrote:
>>> Do things get faster if you have the database index remote_id and
>>> log_time?
>>> 
>> 
>> The log_time column is indexed, but I couldn't work out how to index two
>> columns in mysql.
>> 
>> Z.
>> 
>> 
>> 
>

Re: OT: COUNT query speed issue

Posted by Philippe Laflamme <ph...@mail.mcgill.ca>.

> The log_time column is indexed, but I couldn't work out how to index two
> columns in mysql.

Indexing two (or more) columns in MySQL is the same as indexing one:

CREATE TABLE a_table(
   remote_id INTEGER UNSIGNED NOT NULL,
   log_time DATE NOT NULL,
   INDEX remote_time(remote_id, log_time)
)

Simply comma-seperate the column names in your INDEX statement... This 
is the MySQL >4.0 syntax. I don't know about previous versions.

Cheers,
Philippe

Zoran Avtarovski wrote:
>>Do things get faster if you have the database index remote_id and
>>log_time?
>>
> 
> The log_time column is indexed, but I couldn't work out how to index two
> columns in mysql.
> 
> Z.
> 
> 
>

Re: OT: COUNT query speed issue

Posted by Zoran Avtarovski <zo...@sparecreative.com>.

> Do things get faster if you have the database index remote_id and
> log_time?
> 
The log_time column is indexed, but I couldn't work out how to index two
columns in mysql.

Z.

Re: OT: COUNT query speed issue

Posted by Ron Grabowski <ro...@yahoo.com>.

Do things get faster if you have the database index remote_id and
log_time?

--- Zoran Avtarovski <zo...@sparecreative.com> wrote:

> I have a quick query regarding the speed of count query I use to
> generate a
> chart, which is too slow at present and I'm looking for suggestions
> to speed
> it up.
> 
> I pass a collection of the last 30 days data usage to a simple Flash
> charting app and I use the ibatis query below  in a loop thirty times
> to get
> the data for each successive day. The table has about half a million
> entries
> and will only be getting bigger.
> 
> SELECT COUNT(*) AS count FROM log  WHERE remote_id = #remoteId# and
> log_time
> BETWEEN #lowDate# AND #highDate#
> 
> 
> The complete query (all thirty days) takes over a minute to complete,
> which
> causes timeouts. I did some profiling and the bottle neck is with the
> above
> query. Clearly, I'm doing something wrong. Is there a better formed
> query
> that will improve speed or should I be  looking at a different
> approach to
> this issue?
> 
> I know this sounds stupid but I'm looking for pointers for best
> practice so
> I don't have to revisit this issue down the track.
> 
> 
> Zoran
> 
> 
>