You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Sai Pullabhotla <sa...@jmethods.com> on 2008/12/24 13:23:48 UTC

Need Advice on Indexes

Dear Developers,

I currently have the following table and index in a Derby database.

create table job(
	job_id bigint not null,
	proj_id integer not null,
	proj_name varchar(50) not null,
	proj_folder varchar(256) not null,
	submit_user varchar(20) not null,
	run_user varchar(20) not null,
	queue_priority integer not null,
	run_priority integer not null,
	submit_time timestamp not null,
	start_time timestamp not null,
	end_time timestamp,
	time_taken bigint,
	job_log varchar(256) not null,
	status char(1) default 'A' not null,
	remarks varchar(512),
	sched_job_id integer,
	
	primary key(job_id),
	check(status in('A', 'S', 'F', 'C'))
);

create index idx_job on job (proj_id desc, proj_name desc, proj_folder
desc, start_time desc, end_time desc, run_user desc, time_taken desc,
status desc, sched_job_id desc);

This table would typically have a few hundred thousand records
(~500,000). Records should be searchable by a combination of date
range (start_time), project name, project folder, run user, status
(one or more status codes using IN clause). To further clarify this
search criteria, date range is always required and the rest of the
parameters are optional.

There is also a need to search records by job_id alone as well as
sched_job_id alone. In the first case, there will be at most one
result, but in the second case there could be several.

The search results should be sortable pretty much by any column except
remarks and job log.

We also have a need to display the job count based on the status code
for today, yesterday, last week and last month. In other words, the
last week statistics would show how many jobs were successful, how
many were failed etc. in the past one week.

I'm wondering if the above index I've is good enough to perform all
these functions fast enough. Please advise.

Regards,

Sai Pullabhotla
Phone: (402) 408-5753
Fax: (402) 408-6861
www.jMethods.com

Re: Need Advice on Indexes

Posted by Sai Pullabhotla <sa...@jmethods.com>.

Thanks for letting me know how the indexes work. I will play with it
some more by changing the order of indexes/changing the select
statements and see what the optimizer does.

Sai Pullabhotla
Phone: (402) 408-5753
Fax: (402) 408-6861
www.jMethods.com




On Thu, Dec 25, 2008 at 6:32 PM,  <de...@segel.com> wrote:
>
> Sigh.
>
>
> Sai,
>
> You post a question to this mailing list looking for help.
> The help you are getting is free so take it with a grain of salt.
> But also consider that the people who are taking the time to help you are
> seasoned database professionals who have a lot of expertise.
>
> You are trying to rationalize a bad design. We know that it is a bad design
> because you're here asking for help because your application does not
> perform well.
>
> I would suggest that you consider what is being said and then taking it back
> to your team so you can fix your design.
>
> Not having indexes is bad. Fixing this doesn't fix your poor design. It will
> improve some performance issues, but you'll not get the most from your
> database.
>
> I understand that you're trying to store your log data in to a database so
> that you can take advantage of what a database does best.
>
> But if you're going to take the time and effort to do this, why don't you do
> it right from the start and do the little extra work to make your life
> simpler in the long run.
>
> There's an expression... '... there is more than one way to skin a cat ...'.
> This is true when it comes to solving your problems in software.
>
> If you add a column to your user table a emp_status char(1), you can store
> values like 'A' for active, or 'N' for non active. Or if your database
> supports binary data types, True or False for active or non-active.
>
> You can then have your program get the correct user id based on username and
> emp_status. You then use this information when you insert the record in to
> your job table. This will 'shrink' the width of the columns. (I'm not going
> to suggest alternatives like triggers and stored procedures but they also
> work well...)
>
> You can then do the same for your project table.
>
> This should clean up your table design.
>
> Getting in to your questions about indexes...
>
> First, lets take a look at your query:
>> select  job_id, proj_id, proj_name, proj_folder, submit_user,
>> run_user, queue_priority, run_priority, submit_time, start_time,
>> end_time, time_taken, job_log, status from dpa_job where start_time>=?
>> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
>>
>
> I rewrote it in the following format.
>
> SELECT  job_id, proj_id, proj_name, proj_folder,
>        submit_user, run_user, queue_priority,
>        run_priority, submit_time, start_time, end_time,
>        time_taken, job_log, status
> FROM    dpa_job
> WHERE   start_time>=?
> AND     start_time<?
> AND     status IN(?, ?, ?, ?)
> ORDER BY start_time desc
>
> The reason you want to do this is that you now have visual cues to look at
> your query and see what is going on.  It doesn't take much more time and if
> you make this a habit, you'll get even faster at spotting problems.
>
> The SELECT clause shows you the columns that you're displaying.
> The FROM clause shows you where the data is coming from.
> The WHERE clause indicate what filters are being used to limit the data.
> (Its really the WHERE, AND, and OR statements.)
> This is where you need to focus your attention when you want to create
> indexes that your table will use.
>
> In this query, you are filtering on only two columns. So if you want to
> improve the query's performance, you want an index that is ordered on one or
> both of these columns. Since start_time is more unique than status, you'll
> want the index to order first on start_time, and then on status. You may
> want to not use status in the compound index unless you really have a lot of
> different status types. (It looks like you only have a handful of status
> types.
>
> If you have other queries, you'd want to perform the same exercise and see
> how the queries filter the datasets from each table and then see if you
> already have an index that could be used. If not, then you'd want to add an
> index.
>
> An example...
> Suppose you want to find all of the jobs submitted for a given project by a
> given user.
>
> Your WHERE clause would look like:
> WHERE  proj_id = ?
> AND    submit_user = ?
>
> If you want to limit the time stamp range you'd add the following:
> AND    start_time >=?
> AND    start_time <=?
>
> You could then index on proj_id, submit_user, and possibly start_time in
> that order.
>
> (Since you may have an index on start_time, you may want to check to see
> which index the optimizer is going to use. It is possible that the optimizer
> uses the index on start_time and then filters the data set by the proj_id
> and submit_user. This is going to be database dependent.)
>
> Since you have a database and data, you can use ij to create and drop
> indexes and test the queries to see how they perform.
>
> I would suggest that you look at a good relational database design book to
> see how to improve performance and how relational databases work.
>
> Way back when, I used C J Date's book in school.
> The text I used is way out of print, but here's a list of CJ Date's books
> that can be found on Amazon:
> http://www.amazon.com/s/qid=1230251379/ref=sr_pg_2?ie=UTF8&rs=1000&sort=date
> rank&keywords=Relational%20Database&rh=i%3Astripbooks%2Ck%3ARelational%20Dat
> abase%2Cp_27%3AC.%20J.%20Date&page=1
>
> This may be a good start:
> http://www.amazon.com/Database-Depth-Relational-Theory-Practitioners/dp/0596
> 100124/ref=sr_1_6?ie=UTF8&s=books&qid=1230251480&sr=1-6
>
> HTH
>
> -Mike
>
>> -----Original Message-----
>> From: Sai Pullabhotla [mailto:sai.pullabhotla@jmethods.com]
>> Sent: Thursday, December 25, 2008 11:06 AM
>> To: Derby Discussion
>> Subject: Re: Need Advice on Indexes
>>
>> Mike,
>>
>> We went through all of these during the design phase and decided that
>> we do not have to do the kind of normalization you have mentioned.
>> Below are a couple of reasons:
>>
>> As stated before, my jobs table is nothing more than a plain LOG file
>> which is stored in a database table instead of a regular file on the
>> file system. The only reason why we needed to store it in the database
>> is for easy search/filtering of records. If I were to use a plain log
>> file, I don't think I would want to print a user ID (a number) to a
>> log file which does not make much sense to the users reading it. I
>> guess the same holds true regardless of the underlying data store
>> (plain file or database table or something else).
>>
>> About not really deleting a user record and having a "status" field in
>> the user table, we did go through this and here are the issues with
>> this approach:
>> Let us say my table definition is -
>> CREATE TABLE user(
>>     user_id integer primary key,
>>     user_name char(20),
>>     full_name varchar(50));
>>
>> We do want to have a unique constraint on the user_name field. If we
>> do not purge the user record when a delete is requested, then I cannot
>> have a new user with the same name as a user that was deleted 2 years
>> ago. Same is true even if you create a composite unique key with
>> user_name and status fields. Correct me if I'm wrong.
>>
>> You said that -
>>
>> > Based on this query, your index would never be used.
>>
>> Why not and how do I make Derby to use my index in various scenarios
>> I've mentioned.
>>
>> Thanks.
>> Sai Pullabhotla
>> Phone: (402) 408-5753
>> Fax: (402) 408-6861
>> www.jMethods.com
>>
>>
>>
>>
>> On Wed, Dec 24, 2008 at 4:10 PM,  <de...@segel.com> wrote:
>> > Sai, please see my comments in yours...
>> >
>> >> -----Original Message-----
>> >> From: Sai Pullabhotla [mailto:sai.pullabhotla@jmethods.com]
>> >> Sent: Wednesday, December 24, 2008 1:54 PM
>> >> To: Derby Discussion; msegel@segel.com
>> >> Subject: Re: Need Advice on Indexes
>> >>
>> >> Rick and Mike,
>> >>
>> >> Thanks for taking time to reply to my question.
>> >>
>> >> As far as normalizing the table that Mike suggested -
>> >>
>> >> this jobs table is nothing more than a history table. So, a project or
>> >> user exists today but may not exist tomorrow. However, we want to
>> >> maintain the history of every project that was executed (this is
>> >> considered a job and hence the name). Since, a user may not exist in a
>> >> future time, I can not maintain a foreign key to my user table even
>> >> though I've a user table and an ID column in it. Same with projects. I
>> >> hope it makes sense now why this table is a flat table. However, I
>> >> think I can move the remarks column into its own table and use a
>> >> longvarchar or clob field.
>> >>
>> > Ok,
>> > That's not a good design. Your user information should be stored
>> separately
>> > and have an id associated with it. You don't want to purge this
>> information;
>> > however you can associate a status field as part of this table. So you
>> > retain and don't reuse the ids and you can still keep historical data
>> > intact.
>> >
>> > One of the problems you may have is how you populate the field with the
>> > employee's name. What happens if you have two employees with the same
>> name?
>> > (Yes it happens. ;-)
>> > The other problem is what happens when the person entering the name
>> > misspells the name? (Segel != Segal,Siegel,Siegal,Seagull) :-)
>> >
>> > In addition, your table isn't normalized when you clearly have a 1 to
>> many
>> > relationship of data. Conceptually there are multiple jobs to a project.
>> So
>> > when you store jobs and projects in the same data, its not normalized.
>> >
>> > Note: There are times when you break from using 3rd normal form, however
>> it
>> > requires a lot of practical experience to justify it.
>> >
>> >
>> >> Now, as far as the queries go, Most of the times, it is -
>> >>
>> >> select  job_id, proj_id, proj_name, proj_folder, submit_user,
>> >> run_user, queue_priority, run_priority, submit_time, start_time,
>> >> end_time, time_taken, job_log, status from dpa_job where start_time>=?
>> >> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
>> >>
>> > Ok,
>> > Not to nitpick, but I'm a bit old school and when writing a query I tend
>> to
>> > want to see it formatted to be easier to read. Taking your query...
>> >
>> > SELECT  job_id, proj_id, proj_name, proj_folder,
>> >        submit_user, run_user, queue_priority,
>> >        run_priority, submit_time, start_time, end_time,
>> >        time_taken, job_log, status
>> > FROM    dpa_job
>> > WHERE   start_time>=?
>> > AND     start_time<?
>> > AND     status IN(?, ?, ?, ?)
>> > ORDER BY start_time desc
>> >
>> > It's a bit easier to read. The key words are capitalized and you can see
>> > your queries.
>> >
>> > Now looking at your common query example, the only thing you need to
>> index
>> > on is the start time column. You can add status to the query, however
>> its
>> > not going to really impact the performance of the query... unless you
>> really
>> > have a lot of rows. (Like more than 10,000 rows in the result set if you
>> > ignore the status filter.)
>> >
>> > Does the IN clause effect the performance? There was an issue earlier in
>> > Derby, but I think it has been fixed for a couple of versions. In your
>> > example query with only 4 values, it is not that big of an issue.
>> >
>> > Based on this query, your index would never be used.
>> >
>> > You've indicated that this is a production database with customers using
>> it.
>> > I'm going to strongly suggest that your company hire a competent DBA.
>> You've
>> > said a couple of things that are major red flags.  When I say DBA, I
>> mean
>> > someone who could be considered a logical DBA and has both data modeling
>> and
>> > app development expertise. If not a FTE, get a consultant in to do a
>> health
>> > check.
>> >
>> >
>> >
>> >> Sai Pullabhotla
>> >> Phone: (402) 408-5753
>> >> Fax: (402) 408-6861
>> >> www.jMethods.com
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>
>
>

RE: Need Advice on Indexes

Posted by de...@segel.com.

Sigh.

Sai,

You post a question to this mailing list looking for help. 
The help you are getting is free so take it with a grain of salt.
But also consider that the people who are taking the time to help you are
seasoned database professionals who have a lot of expertise.

You are trying to rationalize a bad design. We know that it is a bad design
because you're here asking for help because your application does not
perform well.

I would suggest that you consider what is being said and then taking it back
to your team so you can fix your design. 

Not having indexes is bad. Fixing this doesn't fix your poor design. It will
improve some performance issues, but you'll not get the most from your
database.

I understand that you're trying to store your log data in to a database so
that you can take advantage of what a database does best.

But if you're going to take the time and effort to do this, why don't you do
it right from the start and do the little extra work to make your life
simpler in the long run.

There's an expression... '... there is more than one way to skin a cat ...'.
This is true when it comes to solving your problems in software.

If you add a column to your user table a emp_status char(1), you can store
values like 'A' for active, or 'N' for non active. Or if your database
supports binary data types, True or False for active or non-active.

You can then have your program get the correct user id based on username and
emp_status. You then use this information when you insert the record in to
your job table. This will 'shrink' the width of the columns. (I'm not going
to suggest alternatives like triggers and stored procedures but they also
work well...)

You can then do the same for your project table.

This should clean up your table design.

Getting in to your questions about indexes...

First, lets take a look at your query:
> select  job_id, proj_id, proj_name, proj_folder, submit_user, 
> run_user, queue_priority, run_priority, submit_time, start_time, 
> end_time, time_taken, job_log, status from dpa_job where start_time>=?
> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
> 

I rewrote it in the following format.

SELECT  job_id, proj_id, proj_name, proj_folder, 
        submit_user, run_user, queue_priority, 
        run_priority, submit_time, start_time, end_time, 
        time_taken, job_log, status 
FROM    dpa_job 
WHERE   start_time>=?
AND     start_time<? 
AND     status IN(?, ?, ?, ?) 
ORDER BY start_time desc

The reason you want to do this is that you now have visual cues to look at
your query and see what is going on.  It doesn't take much more time and if
you make this a habit, you'll get even faster at spotting problems.

The SELECT clause shows you the columns that you're displaying.
The FROM clause shows you where the data is coming from.
The WHERE clause indicate what filters are being used to limit the data. 
(Its really the WHERE, AND, and OR statements.)
This is where you need to focus your attention when you want to create
indexes that your table will use.

In this query, you are filtering on only two columns. So if you want to
improve the query's performance, you want an index that is ordered on one or
both of these columns. Since start_time is more unique than status, you'll
want the index to order first on start_time, and then on status. You may
want to not use status in the compound index unless you really have a lot of
different status types. (It looks like you only have a handful of status
types.

If you have other queries, you'd want to perform the same exercise and see
how the queries filter the datasets from each table and then see if you
already have an index that could be used. If not, then you'd want to add an
index.

An example... 
Suppose you want to find all of the jobs submitted for a given project by a
given user.

Your WHERE clause would look like:
WHERE  proj_id = ?
AND    submit_user = ?

If you want to limit the time stamp range you'd add the following:
AND    start_time >=?
AND    start_time <=?

You could then index on proj_id, submit_user, and possibly start_time in
that order.

(Since you may have an index on start_time, you may want to check to see
which index the optimizer is going to use. It is possible that the optimizer
uses the index on start_time and then filters the data set by the proj_id
and submit_user. This is going to be database dependent.)

Since you have a database and data, you can use ij to create and drop
indexes and test the queries to see how they perform. 

I would suggest that you look at a good relational database design book to
see how to improve performance and how relational databases work.

Way back when, I used C J Date's book in school.
The text I used is way out of print, but here's a list of CJ Date's books
that can be found on Amazon:
http://www.amazon.com/s/qid=1230251379/ref=sr_pg_2?ie=UTF8&rs=1000&sort=date
rank&keywords=Relational%20Database&rh=i%3Astripbooks%2Ck%3ARelational%20Dat
abase%2Cp_27%3AC.%20J.%20Date&page=1

This may be a good start:
http://www.amazon.com/Database-Depth-Relational-Theory-Practitioners/dp/0596
100124/ref=sr_1_6?ie=UTF8&s=books&qid=1230251480&sr=1-6

HTH

-Mike

> -----Original Message-----
> From: Sai Pullabhotla [mailto:sai.pullabhotla@jmethods.com]
> Sent: Thursday, December 25, 2008 11:06 AM
> To: Derby Discussion
> Subject: Re: Need Advice on Indexes
> 
> Mike,
> 
> We went through all of these during the design phase and decided that
> we do not have to do the kind of normalization you have mentioned.
> Below are a couple of reasons:
> 
> As stated before, my jobs table is nothing more than a plain LOG file
> which is stored in a database table instead of a regular file on the
> file system. The only reason why we needed to store it in the database
> is for easy search/filtering of records. If I were to use a plain log
> file, I don't think I would want to print a user ID (a number) to a
> log file which does not make much sense to the users reading it. I
> guess the same holds true regardless of the underlying data store
> (plain file or database table or something else).
> 
> About not really deleting a user record and having a "status" field in
> the user table, we did go through this and here are the issues with
> this approach:
> Let us say my table definition is -
> CREATE TABLE user(
>     user_id integer primary key,
>     user_name char(20),
>     full_name varchar(50));
> 
> We do want to have a unique constraint on the user_name field. If we
> do not purge the user record when a delete is requested, then I cannot
> have a new user with the same name as a user that was deleted 2 years
> ago. Same is true even if you create a composite unique key with
> user_name and status fields. Correct me if I'm wrong.
> 
> You said that -
> 
> > Based on this query, your index would never be used.
> 
> Why not and how do I make Derby to use my index in various scenarios
> I've mentioned.
> 
> Thanks.
> Sai Pullabhotla
> Phone: (402) 408-5753
> Fax: (402) 408-6861
> www.jMethods.com
> 
> 
> 
> 
> On Wed, Dec 24, 2008 at 4:10 PM,  <de...@segel.com> wrote:
> > Sai, please see my comments in yours...
> >
> >> -----Original Message-----
> >> From: Sai Pullabhotla [mailto:sai.pullabhotla@jmethods.com]
> >> Sent: Wednesday, December 24, 2008 1:54 PM
> >> To: Derby Discussion; msegel@segel.com
> >> Subject: Re: Need Advice on Indexes
> >>
> >> Rick and Mike,
> >>
> >> Thanks for taking time to reply to my question.
> >>
> >> As far as normalizing the table that Mike suggested -
> >>
> >> this jobs table is nothing more than a history table. So, a project or
> >> user exists today but may not exist tomorrow. However, we want to
> >> maintain the history of every project that was executed (this is
> >> considered a job and hence the name). Since, a user may not exist in a
> >> future time, I can not maintain a foreign key to my user table even
> >> though I've a user table and an ID column in it. Same with projects. I
> >> hope it makes sense now why this table is a flat table. However, I
> >> think I can move the remarks column into its own table and use a
> >> longvarchar or clob field.
> >>
> > Ok,
> > That's not a good design. Your user information should be stored
> separately
> > and have an id associated with it. You don't want to purge this
> information;
> > however you can associate a status field as part of this table. So you
> > retain and don't reuse the ids and you can still keep historical data
> > intact.
> >
> > One of the problems you may have is how you populate the field with the
> > employee's name. What happens if you have two employees with the same
> name?
> > (Yes it happens. ;-)
> > The other problem is what happens when the person entering the name
> > misspells the name? (Segel != Segal,Siegel,Siegal,Seagull) :-)
> >
> > In addition, your table isn't normalized when you clearly have a 1 to
> many
> > relationship of data. Conceptually there are multiple jobs to a project.
> So
> > when you store jobs and projects in the same data, its not normalized.
> >
> > Note: There are times when you break from using 3rd normal form, however
> it
> > requires a lot of practical experience to justify it.
> >
> >
> >> Now, as far as the queries go, Most of the times, it is -
> >>
> >> select  job_id, proj_id, proj_name, proj_folder, submit_user,
> >> run_user, queue_priority, run_priority, submit_time, start_time,
> >> end_time, time_taken, job_log, status from dpa_job where start_time>=?
> >> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
> >>
> > Ok,
> > Not to nitpick, but I'm a bit old school and when writing a query I tend
> to
> > want to see it formatted to be easier to read. Taking your query...
> >
> > SELECT  job_id, proj_id, proj_name, proj_folder,
> >        submit_user, run_user, queue_priority,
> >        run_priority, submit_time, start_time, end_time,
> >        time_taken, job_log, status
> > FROM    dpa_job
> > WHERE   start_time>=?
> > AND     start_time<?
> > AND     status IN(?, ?, ?, ?)
> > ORDER BY start_time desc
> >
> > It's a bit easier to read. The key words are capitalized and you can see
> > your queries.
> >
> > Now looking at your common query example, the only thing you need to
> index
> > on is the start time column. You can add status to the query, however
> its
> > not going to really impact the performance of the query... unless you
> really
> > have a lot of rows. (Like more than 10,000 rows in the result set if you
> > ignore the status filter.)
> >
> > Does the IN clause effect the performance? There was an issue earlier in
> > Derby, but I think it has been fixed for a couple of versions. In your
> > example query with only 4 values, it is not that big of an issue.
> >
> > Based on this query, your index would never be used.
> >
> > You've indicated that this is a production database with customers using
> it.
> > I'm going to strongly suggest that your company hire a competent DBA.
> You've
> > said a couple of things that are major red flags.  When I say DBA, I
> mean
> > someone who could be considered a logical DBA and has both data modeling
> and
> > app development expertise. If not a FTE, get a consultant in to do a
> health
> > check.
> >
> >
> >
> >> Sai Pullabhotla
> >> Phone: (402) 408-5753
> >> Fax: (402) 408-6861
> >> www.jMethods.com
> >>
> >>
> >>
> >>
> >
> >
> >
> >

Re: Need Advice on Indexes

Posted by Sai Pullabhotla <sa...@jmethods.com>.

Mike,

We went through all of these during the design phase and decided that
we do not have to do the kind of normalization you have mentioned.
Below are a couple of reasons:

As stated before, my jobs table is nothing more than a plain LOG file
which is stored in a database table instead of a regular file on the
file system. The only reason why we needed to store it in the database
is for easy search/filtering of records. If I were to use a plain log
file, I don't think I would want to print a user ID (a number) to a
log file which does not make much sense to the users reading it. I
guess the same holds true regardless of the underlying data store
(plain file or database table or something else).

About not really deleting a user record and having a "status" field in
the user table, we did go through this and here are the issues with
this approach:
Let us say my table definition is -
CREATE TABLE user(
    user_id integer primary key,
    user_name char(20),
    full_name varchar(50));

We do want to have a unique constraint on the user_name field. If we
do not purge the user record when a delete is requested, then I cannot
have a new user with the same name as a user that was deleted 2 years
ago. Same is true even if you create a composite unique key with
user_name and status fields. Correct me if I'm wrong.

You said that -

> Based on this query, your index would never be used.

Why not and how do I make Derby to use my index in various scenarios
I've mentioned.

Thanks.
Sai Pullabhotla
Phone: (402) 408-5753
Fax: (402) 408-6861
www.jMethods.com




On Wed, Dec 24, 2008 at 4:10 PM,  <de...@segel.com> wrote:
> Sai, please see my comments in yours...
>
>> -----Original Message-----
>> From: Sai Pullabhotla [mailto:sai.pullabhotla@jmethods.com]
>> Sent: Wednesday, December 24, 2008 1:54 PM
>> To: Derby Discussion; msegel@segel.com
>> Subject: Re: Need Advice on Indexes
>>
>> Rick and Mike,
>>
>> Thanks for taking time to reply to my question.
>>
>> As far as normalizing the table that Mike suggested -
>>
>> this jobs table is nothing more than a history table. So, a project or
>> user exists today but may not exist tomorrow. However, we want to
>> maintain the history of every project that was executed (this is
>> considered a job and hence the name). Since, a user may not exist in a
>> future time, I can not maintain a foreign key to my user table even
>> though I've a user table and an ID column in it. Same with projects. I
>> hope it makes sense now why this table is a flat table. However, I
>> think I can move the remarks column into its own table and use a
>> longvarchar or clob field.
>>
> Ok,
> That's not a good design. Your user information should be stored separately
> and have an id associated with it. You don't want to purge this information;
> however you can associate a status field as part of this table. So you
> retain and don't reuse the ids and you can still keep historical data
> intact.
>
> One of the problems you may have is how you populate the field with the
> employee's name. What happens if you have two employees with the same name?
> (Yes it happens. ;-)
> The other problem is what happens when the person entering the name
> misspells the name? (Segel != Segal,Siegel,Siegal,Seagull) :-)
>
> In addition, your table isn't normalized when you clearly have a 1 to many
> relationship of data. Conceptually there are multiple jobs to a project. So
> when you store jobs and projects in the same data, its not normalized.
>
> Note: There are times when you break from using 3rd normal form, however it
> requires a lot of practical experience to justify it.
>
>
>> Now, as far as the queries go, Most of the times, it is -
>>
>> select  job_id, proj_id, proj_name, proj_folder, submit_user,
>> run_user, queue_priority, run_priority, submit_time, start_time,
>> end_time, time_taken, job_log, status from dpa_job where start_time>=?
>> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
>>
> Ok,
> Not to nitpick, but I'm a bit old school and when writing a query I tend to
> want to see it formatted to be easier to read. Taking your query...
>
> SELECT  job_id, proj_id, proj_name, proj_folder,
>        submit_user, run_user, queue_priority,
>        run_priority, submit_time, start_time, end_time,
>        time_taken, job_log, status
> FROM    dpa_job
> WHERE   start_time>=?
> AND     start_time<?
> AND     status IN(?, ?, ?, ?)
> ORDER BY start_time desc
>
> It's a bit easier to read. The key words are capitalized and you can see
> your queries.
>
> Now looking at your common query example, the only thing you need to index
> on is the start time column. You can add status to the query, however its
> not going to really impact the performance of the query... unless you really
> have a lot of rows. (Like more than 10,000 rows in the result set if you
> ignore the status filter.)
>
> Does the IN clause effect the performance? There was an issue earlier in
> Derby, but I think it has been fixed for a couple of versions. In your
> example query with only 4 values, it is not that big of an issue.
>
> Based on this query, your index would never be used.
>
> You've indicated that this is a production database with customers using it.
> I'm going to strongly suggest that your company hire a competent DBA. You've
> said a couple of things that are major red flags.  When I say DBA, I mean
> someone who could be considered a logical DBA and has both data modeling and
> app development expertise. If not a FTE, get a consultant in to do a health
> check.
>
>
>
>> Sai Pullabhotla
>> Phone: (402) 408-5753
>> Fax: (402) 408-6861
>> www.jMethods.com
>>
>>
>>
>>
>
>
>
>

RE: Need Advice on Indexes

Posted by de...@segel.com.

Sai, please see my comments in yours...

> -----Original Message-----
> From: Sai Pullabhotla [mailto:sai.pullabhotla@jmethods.com]
> Sent: Wednesday, December 24, 2008 1:54 PM
> To: Derby Discussion; msegel@segel.com
> Subject: Re: Need Advice on Indexes
> 
> Rick and Mike,
> 
> Thanks for taking time to reply to my question.
> 
> As far as normalizing the table that Mike suggested -
> 
> this jobs table is nothing more than a history table. So, a project or
> user exists today but may not exist tomorrow. However, we want to
> maintain the history of every project that was executed (this is
> considered a job and hence the name). Since, a user may not exist in a
> future time, I can not maintain a foreign key to my user table even
> though I've a user table and an ID column in it. Same with projects. I
> hope it makes sense now why this table is a flat table. However, I
> think I can move the remarks column into its own table and use a
> longvarchar or clob field.
> 
Ok,
That's not a good design. Your user information should be stored separately
and have an id associated with it. You don't want to purge this information;
however you can associate a status field as part of this table. So you
retain and don't reuse the ids and you can still keep historical data
intact.

One of the problems you may have is how you populate the field with the
employee's name. What happens if you have two employees with the same name?
(Yes it happens. ;-)
The other problem is what happens when the person entering the name
misspells the name? (Segel != Segal,Siegel,Siegal,Seagull) :-)

In addition, your table isn't normalized when you clearly have a 1 to many
relationship of data. Conceptually there are multiple jobs to a project. So
when you store jobs and projects in the same data, its not normalized.

Note: There are times when you break from using 3rd normal form, however it
requires a lot of practical experience to justify it.


> Now, as far as the queries go, Most of the times, it is -
> 
> select  job_id, proj_id, proj_name, proj_folder, submit_user,
> run_user, queue_priority, run_priority, submit_time, start_time,
> end_time, time_taken, job_log, status from dpa_job where start_time>=?
> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
> 
Ok,
Not to nitpick, but I'm a bit old school and when writing a query I tend to
want to see it formatted to be easier to read. Taking your query...

SELECT  job_id, proj_id, proj_name, proj_folder, 
        submit_user, run_user, queue_priority, 
        run_priority, submit_time, start_time, end_time, 
        time_taken, job_log, status 
FROM    dpa_job 
WHERE   start_time>=?
AND     start_time<? 
AND     status IN(?, ?, ?, ?) 
ORDER BY start_time desc

It's a bit easier to read. The key words are capitalized and you can see
your queries.

Now looking at your common query example, the only thing you need to index
on is the start time column. You can add status to the query, however its
not going to really impact the performance of the query... unless you really
have a lot of rows. (Like more than 10,000 rows in the result set if you
ignore the status filter.)

Does the IN clause effect the performance? There was an issue earlier in
Derby, but I think it has been fixed for a couple of versions. In your
example query with only 4 values, it is not that big of an issue. 

Based on this query, your index would never be used.

You've indicated that this is a production database with customers using it.
I'm going to strongly suggest that your company hire a competent DBA. You've
said a couple of things that are major red flags.  When I say DBA, I mean
someone who could be considered a logical DBA and has both data modeling and
app development expertise. If not a FTE, get a consultant in to do a health
check.



> Sai Pullabhotla
> Phone: (402) 408-5753
> Fax: (402) 408-6861
> www.jMethods.com
> 
> 
> 
>

Re: Need Advice on Indexes

Posted by Sai Pullabhotla <sa...@jmethods.com>.

Rick and Mike,

Thanks for taking time to reply to my question.

As far as normalizing the table that Mike suggested -

this jobs table is nothing more than a history table. So, a project or
user exists today but may not exist tomorrow. However, we want to
maintain the history of every project that was executed (this is
considered a job and hence the name). Since, a user may not exist in a
future time, I can not maintain a foreign key to my user table even
though I've a user table and an ID column in it. Same with projects. I
hope it makes sense now why this table is a flat table. However, I
think I can move the remarks column into its own table and use a
longvarchar or clob field.

Now, as far as the queries go, Most of the times, it is -

select  job_id, proj_id, proj_name, proj_folder, submit_user,
run_user, queue_priority, run_priority, submit_time, start_time,
end_time, time_taken, job_log, status from dpa_job where start_time>=?
and start_time<? and status in(?, ?, ?, ?) order by start_time desc

As stated in my original email, The date range will always be there. I
can get rid of the "and status in(?,?,?,?) if the user does not care
about the job's status. But, if the user selected one or more status
types, that IN clause needs to stay there. Not sure if that IN effects
performance and how.

Now another variation of the query (let us say, the user fills in
every parameter on the search screen), then the query looks like:

select  job_id, proj_id, proj_name, proj_folder, submit_user,
run_user, queue_priority, run_priority, submit_time, start_time,
end_time, time_taken, job_log, status from dpa_job where start_time>=?
and start_time<? and status in(?, ?) and run_user=? and proj_folder=?
and proj_name=? order by start_time desc

Now, the order by clause will have a different column if the user
chooses a custom sorting. But it is always by one column.

Another query we do is when a user wants to see the complete execution
history of a project (the project name could have changed since its
creation too). That's why we maintain the project ID in the job table.
This query looks like:

select  job_id, proj_id, proj_name, proj_folder, submit_user,
run_user, queue_priority, run_priority, submit_time, start_time,
end_time, time_taken, job_log, status from dpa_job where proj_id=? and
start_time>=? and start_time<? and status in(?, ?, ?, ?) order by
start_time desc

And finally, the select statements for showing the quick stats (show
the job count for today, yesterday, last week etc) here is the
statement:

select count(job_id) from dpa_job where start_time >= ? and start_time
< ? and status=?

This statement shows how many jobs were successful, failed, canceled
in the given date range.

Now the main reason I posted this question is because of the last
query (the quick stats) which one of our customer complained that it
takes too long like 5 to 10 minutes when he had about 500,000 records.
I did not notice my system taking that long when I tested with
300,000+ records.

Hope I've provided enough information for you. Let me know if you need
additional information.

Your help is greatly appreciated.

Sai Pullabhotla
Phone: (402) 408-5753
Fax: (402) 408-6861
www.jMethods.com




On Wed, Dec 24, 2008 at 11:57 AM,  <de...@segel.com> wrote:
> Actually its not all that hard. :-)
>
>
> <soap box> Too many people are acting like 'physical' DBAs without
> understanding how a relational database or index works. Are there any good
> online tutorials which will 'teach' the basics? </soap box>
>
> <caveat> What I am writing is a generic statement or guideline to
> table/index design in general. It is not specific to derby/cloudscape/javadb
> and should be taken with a grain of salt. I am sure that there will be
> someone who disagrees and will tell you something different. As always YMMV.
> </caveat>
>
> Just on the surface a couple of things ...
>
> First, even without knowing your application or the rest of your database, I
> will say that you first need to normalize your data. Performance wise, this
> is pretty important.
>
> You're mixing and matching project information and job information in to one
> table. Also, just a hint. Its more efficient to store a user_id than the
> entire user name. (At most, a 4 byte word vs a varchar of 1-4 byte
> characters per character. [Unicode UTF-8 character set for
> internationalization]) Even considering ASCII, 'John Smith' on every row
> takes up more space than employee_id 112. And since you're tracking both the
> submit user id and run user id, that's twice the savings per row. ;-)
> So you'll want to create look up tables and create more of a star schema
> than just your flat table.
>
> Second, your index is almost as large as your table. You're creating an
> index on what looks like 10 of the 16 columns? (I probably miscounted.)
> This isn't a good idea. You'll see this sort of thing done because some
> engines (maybe all) will only search the index if all of the columns of the
> query are also in the index, so the idea is that you'll save yourself some
> seek times.
>
> Just a rule of thumb, in relational databases you really don't want to make
> it a habit of joining more than 3 or 4 tables together. With respect to
> indexes, when you get beyond 3 or 4 columns you're not going to get much
> more unique values.  The key is to use the index to limit your search for
> values, however once you get down to a reasonable subset, the database will
> be doing a sequential scan of your subset. In fact another rule of thumb is
> that if your table has less than 10,000 rows, most engines do not use an
> index.
>
> Another issue. Having large indexes will kill your performance on inserts
> and updates. (You add/change a row in the table, your index has to be
> changed as well.) So unless you have a situation where you have lots of
> reads and little updates/inserts, you're going to have performance issues.
>
> Also, in general, when you do a query on a table, most databases will only
> use one index. (Some databases like XPS or other data warehouse centric
> databases will use multiple indexes on the data.)
> The other big issue is that if your query is sorting on the 3rd or 4th
> column of the index, most likely the optimizer will not use the index.
> Using your index, that means if you have a statement like:
>
> SELECT *
> FROM   job
> WHERE  submit_user = "JOHN DOE"
>
> The database will not use the index but will do a sequential table scan.
>
> In looking at your example, its easy to see that you could split this in to
> a couple of different indexes.
>
> As Rick points out, without knowing your queries, its hard to say what
> indexes you'll need.
>
> First, split the table to normalize the data.
> Figure out the indexes needed for joining the table and also how you're
> going to query for a job.
>
> In addition... if you're really looking for the best performance, think
> about how often you'll actually look at a column in the join. You have a lot
> of varchar columns. Are they going to be long enough? What's the minimum
> you're going to be storing in the actual table (min length usually gets
> stored in the table while the rest is referenced from a different
> tablespace/page). If this table is constantly being used, but you're not
> actually looking at the remarks or comments (large varchar columns), you'll
> want to think about putting them in a separate table. Or even converting
> them to a text blob so that you don't have an artificial limitation.
> (Assuming that you'll want or need more that 512 characters in a comment.)
> Table width is still a factor, albeit, with each generation of advancements
> in hard drive, memory, I/O, this becomes less and less of an issue.
>
> The separating out the varchar is really, really necessary if you're doing a
> lot of look ups on the non varchar columns and need to squeeze the most out
> of your performance.
>
> Does this help?
>
> BTW, I think its great that you're thinking about compound indexes. (indexes
> using multiple columns.) Too often developers only think about a single
> column or the framework only creates a single column index for you. Then you
> end up with a lot of indexes and when you create your own indexes that are
> compound indexes, you'll need to add optimizer hints in your query
> statements. (If your database supports optimizer hints.)
>
> -Mike
>
> PS. Sorry if this seems a bit disjointed. I'm trying to write this as my
> wife is bossing me around the house. :-)
>
> Happy Holidays!
>
>
>> -----Original Message-----
>> From: Richard.Hillegas@Sun.COM [mailto:Richard.Hillegas@Sun.COM]
>> Sent: Wednesday, December 24, 2008 7:50 AM
>> To: Derby Discussion
>> Subject: Re: Need Advice on Indexes
>>
>> Hi Sai,
>>
>> It's hard to answer your question without seeing your queries. However,
>> for an index to be useful, some subset of the restricted columns in your
>> query must form a leading prefix of the indexed columns. So for
>> instance, your index would not be useful for the following query:
>>
>>   select * from job where proj_name  = 'Plan 9'
>>
>> because the leading column of the index, proj_id, is not restricted by
>> the query. If the timestamp ranges are always necessary in your queries,
>> then I would make them the leading columns in the index.
>>
>> Hope this helps,
>> -Rick
>>
>> Sai Pullabhotla wrote:
>> > Dear Developers,
>> >
>> > I currently have the following table and index in a Derby database.
>> >
>> > create table job(
>> >     job_id bigint not null,
>> >     proj_id integer not null,
>> >     proj_name varchar(50) not null,
>> >     proj_folder varchar(256) not null,
>> >     submit_user varchar(20) not null,
>> >     run_user varchar(20) not null,
>> >     queue_priority integer not null,
>> >     run_priority integer not null,
>> >     submit_time timestamp not null,
>> >     start_time timestamp not null,
>> >     end_time timestamp,
>> >     time_taken bigint,
>> >     job_log varchar(256) not null,
>> >     status char(1) default 'A' not null,
>> >     remarks varchar(512),
>> >     sched_job_id integer,
>> >
>> >     primary key(job_id),
>> >     check(status in('A', 'S', 'F', 'C'))
>> > );
>> >
>> > create index idx_job on job (proj_id desc, proj_name desc, proj_folder
>> > desc, start_time desc, end_time desc, run_user desc, time_taken desc,
>> > status desc, sched_job_id desc);
>> >
>> > This table would typically have a few hundred thousand records
>> > (~500,000). Records should be searchable by a combination of date
>> > range (start_time), project name, project folder, run user, status
>> > (one or more status codes using IN clause). To further clarify this
>> > search criteria, date range is always required and the rest of the
>> > parameters are optional.
>> >
>> > There is also a need to search records by job_id alone as well as
>> > sched_job_id alone. In the first case, there will be at most one
>> > result, but in the second case there could be several.
>> >
>> > The search results should be sortable pretty much by any column except
>> > remarks and job log.
>> >
>> > We also have a need to display the job count based on the status code
>> > for today, yesterday, last week and last month. In other words, the
>> > last week statistics would show how many jobs were successful, how
>> > many were failed etc. in the past one week.
>> >
>> > I'm wondering if the above index I've is good enough to perform all
>> > these functions fast enough. Please advise.
>> >
>> > Regards,
>> >
>> > Sai Pullabhotla
>> > Phone: (402) 408-5753
>> > Fax: (402) 408-6861
>> > www.jMethods.com
>> >
>
>
>
>

RE: Need Advice on Indexes

Posted by de...@segel.com.

Actually its not all that hard. :-)

<soap box> Too many people are acting like 'physical' DBAs without
understanding how a relational database or index works. Are there any good
online tutorials which will 'teach' the basics? </soap box>

<caveat> What I am writing is a generic statement or guideline to
table/index design in general. It is not specific to derby/cloudscape/javadb
and should be taken with a grain of salt. I am sure that there will be
someone who disagrees and will tell you something different. As always YMMV.
</caveat>

Just on the surface a couple of things ...

First, even without knowing your application or the rest of your database, I
will say that you first need to normalize your data. Performance wise, this
is pretty important.

You're mixing and matching project information and job information in to one
table. Also, just a hint. Its more efficient to store a user_id than the
entire user name. (At most, a 4 byte word vs a varchar of 1-4 byte
characters per character. [Unicode UTF-8 character set for
internationalization]) Even considering ASCII, 'John Smith' on every row
takes up more space than employee_id 112. And since you're tracking both the
submit user id and run user id, that's twice the savings per row. ;-)
So you'll want to create look up tables and create more of a star schema
than just your flat table. 

Second, your index is almost as large as your table. You're creating an
index on what looks like 10 of the 16 columns? (I probably miscounted.) 
This isn't a good idea. You'll see this sort of thing done because some
engines (maybe all) will only search the index if all of the columns of the
query are also in the index, so the idea is that you'll save yourself some
seek times.

Just a rule of thumb, in relational databases you really don't want to make
it a habit of joining more than 3 or 4 tables together. With respect to
indexes, when you get beyond 3 or 4 columns you're not going to get much
more unique values.  The key is to use the index to limit your search for
values, however once you get down to a reasonable subset, the database will
be doing a sequential scan of your subset. In fact another rule of thumb is
that if your table has less than 10,000 rows, most engines do not use an
index.

Another issue. Having large indexes will kill your performance on inserts
and updates. (You add/change a row in the table, your index has to be
changed as well.) So unless you have a situation where you have lots of
reads and little updates/inserts, you're going to have performance issues.

Also, in general, when you do a query on a table, most databases will only
use one index. (Some databases like XPS or other data warehouse centric
databases will use multiple indexes on the data.)
The other big issue is that if your query is sorting on the 3rd or 4th
column of the index, most likely the optimizer will not use the index.
Using your index, that means if you have a statement like:

SELECT *
FROM   job
WHERE  submit_user = "JOHN DOE"

The database will not use the index but will do a sequential table scan.

In looking at your example, its easy to see that you could split this in to
a couple of different indexes.

As Rick points out, without knowing your queries, its hard to say what
indexes you'll need.

First, split the table to normalize the data.
Figure out the indexes needed for joining the table and also how you're
going to query for a job.

In addition... if you're really looking for the best performance, think
about how often you'll actually look at a column in the join. You have a lot
of varchar columns. Are they going to be long enough? What's the minimum
you're going to be storing in the actual table (min length usually gets
stored in the table while the rest is referenced from a different
tablespace/page). If this table is constantly being used, but you're not
actually looking at the remarks or comments (large varchar columns), you'll
want to think about putting them in a separate table. Or even converting
them to a text blob so that you don't have an artificial limitation.
(Assuming that you'll want or need more that 512 characters in a comment.)
Table width is still a factor, albeit, with each generation of advancements
in hard drive, memory, I/O, this becomes less and less of an issue.

The separating out the varchar is really, really necessary if you're doing a
lot of look ups on the non varchar columns and need to squeeze the most out
of your performance. 

Does this help?

BTW, I think its great that you're thinking about compound indexes. (indexes
using multiple columns.) Too often developers only think about a single
column or the framework only creates a single column index for you. Then you
end up with a lot of indexes and when you create your own indexes that are
compound indexes, you'll need to add optimizer hints in your query
statements. (If your database supports optimizer hints.)

-Mike

PS. Sorry if this seems a bit disjointed. I'm trying to write this as my
wife is bossing me around the house. :-)

Happy Holidays!

> -----Original Message-----
> From: Richard.Hillegas@Sun.COM [mailto:Richard.Hillegas@Sun.COM]
> Sent: Wednesday, December 24, 2008 7:50 AM
> To: Derby Discussion
> Subject: Re: Need Advice on Indexes
> 
> Hi Sai,
> 
> It's hard to answer your question without seeing your queries. However,
> for an index to be useful, some subset of the restricted columns in your
> query must form a leading prefix of the indexed columns. So for
> instance, your index would not be useful for the following query:
> 
>   select * from job where proj_name  = 'Plan 9'
> 
> because the leading column of the index, proj_id, is not restricted by
> the query. If the timestamp ranges are always necessary in your queries,
> then I would make them the leading columns in the index.
> 
> Hope this helps,
> -Rick
> 
> Sai Pullabhotla wrote:
> > Dear Developers,
> >
> > I currently have the following table and index in a Derby database.
> >
> > create table job(
> > 	job_id bigint not null,
> > 	proj_id integer not null,
> > 	proj_name varchar(50) not null,
> > 	proj_folder varchar(256) not null,
> > 	submit_user varchar(20) not null,
> > 	run_user varchar(20) not null,
> > 	queue_priority integer not null,
> > 	run_priority integer not null,
> > 	submit_time timestamp not null,
> > 	start_time timestamp not null,
> > 	end_time timestamp,
> > 	time_taken bigint,
> > 	job_log varchar(256) not null,
> > 	status char(1) default 'A' not null,
> > 	remarks varchar(512),
> > 	sched_job_id integer,
> >
> > 	primary key(job_id),
> > 	check(status in('A', 'S', 'F', 'C'))
> > );
> >
> > create index idx_job on job (proj_id desc, proj_name desc, proj_folder
> > desc, start_time desc, end_time desc, run_user desc, time_taken desc,
> > status desc, sched_job_id desc);
> >
> > This table would typically have a few hundred thousand records
> > (~500,000). Records should be searchable by a combination of date
> > range (start_time), project name, project folder, run user, status
> > (one or more status codes using IN clause). To further clarify this
> > search criteria, date range is always required and the rest of the
> > parameters are optional.
> >
> > There is also a need to search records by job_id alone as well as
> > sched_job_id alone. In the first case, there will be at most one
> > result, but in the second case there could be several.
> >
> > The search results should be sortable pretty much by any column except
> > remarks and job log.
> >
> > We also have a need to display the job count based on the status code
> > for today, yesterday, last week and last month. In other words, the
> > last week statistics would show how many jobs were successful, how
> > many were failed etc. in the past one week.
> >
> > I'm wondering if the above index I've is good enough to perform all
> > these functions fast enough. Please advise.
> >
> > Regards,
> >
> > Sai Pullabhotla
> > Phone: (402) 408-5753
> > Fax: (402) 408-6861
> > www.jMethods.com
> >

Re: Need Advice on Indexes

Posted by Rick Hillegas <Ri...@Sun.COM>.

Hi Sai,

It's hard to answer your question without seeing your queries. However, 
for an index to be useful, some subset of the restricted columns in your 
query must form a leading prefix of the indexed columns. So for 
instance, your index would not be useful for the following query:

  select * from job where proj_name  = 'Plan 9'

because the leading column of the index, proj_id, is not restricted by 
the query. If the timestamp ranges are always necessary in your queries, 
then I would make them the leading columns in the index.

Hope this helps,
-Rick

Sai Pullabhotla wrote:
> Dear Developers,
>
> I currently have the following table and index in a Derby database.
>
> create table job(
> 	job_id bigint not null,
> 	proj_id integer not null,
> 	proj_name varchar(50) not null,
> 	proj_folder varchar(256) not null,
> 	submit_user varchar(20) not null,
> 	run_user varchar(20) not null,
> 	queue_priority integer not null,
> 	run_priority integer not null,
> 	submit_time timestamp not null,
> 	start_time timestamp not null,
> 	end_time timestamp,
> 	time_taken bigint,
> 	job_log varchar(256) not null,
> 	status char(1) default 'A' not null,
> 	remarks varchar(512),
> 	sched_job_id integer,
> 	
> 	primary key(job_id),
> 	check(status in('A', 'S', 'F', 'C'))
> );
>
> create index idx_job on job (proj_id desc, proj_name desc, proj_folder
> desc, start_time desc, end_time desc, run_user desc, time_taken desc,
> status desc, sched_job_id desc);
>
> This table would typically have a few hundred thousand records
> (~500,000). Records should be searchable by a combination of date
> range (start_time), project name, project folder, run user, status
> (one or more status codes using IN clause). To further clarify this
> search criteria, date range is always required and the rest of the
> parameters are optional.
>
> There is also a need to search records by job_id alone as well as
> sched_job_id alone. In the first case, there will be at most one
> result, but in the second case there could be several.
>
> The search results should be sortable pretty much by any column except
> remarks and job log.
>
> We also have a need to display the job count based on the status code
> for today, yesterday, last week and last month. In other words, the
> last week statistics would show how many jobs were successful, how
> many were failed etc. in the past one week.
>
> I'm wondering if the above index I've is good enough to perform all
> these functions fast enough. Please advise.
>
> Regards,
>
> Sai Pullabhotla
> Phone: (402) 408-5753
> Fax: (402) 408-6861
> www.jMethods.com
>