You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Murali V <mu...@diffsoft.com> on 2001/04/19 14:43:43 UTC
Re: [OT] Re: Fast DB access

Thanks for pointing out the mistake in postgres.
Your Advice makes lots of sense.

V Murali
----- Original Message -----
From: Cees Hek <ce...@sitesuite.net>
To: Murali V <mu...@diffsoft.com>
Cc: <mo...@apache.org>
Sent: Friday, April 20, 2001 1:45 AM
Subject: [OT] Re: Fast DB access


>
> On Thu, 19 Apr 2001, Murali V wrote:
>
> > Hi,
> >
> > If you read the code more deeply, you'll find that the timeit is only
> > wrapped around select and not around insert.
> > We've written the insert code so that in the first round you can
populate
> > the database.
> > You comment out the insert code after the first round and run the
benchmark
> > several times. This would only do select and time select.
> >
>
> Hi Murali,
>
> OK, to start off, I was not specifically aiming my rant at you, I was
> replying to someone who had modified your code and was now comparing MySQL
> and PostgreSQL, and he was implying that the timings were for inserts and
> selects.  I took this at face value, and didn't check the code close
> enough which I really should have done in the first place.
>
> > Connecting this error to an axiom that "Benchmarks are useless" is bad
> > indeed. Shouldn't we be ironing out errors and runing benchmarks which
are
> > good.
>
> Perhaps I should have said published benchmarks.  In your case, you are
> using benchmarks for exactly what they are intended for...  Creating a
> system that closely resembles your application and putting it through it's
> paces.  What I find dangerous about publishing benchmarks, is that they
> are almost always heavily swayed to a specific application, and most of
> the time they show what the user wants them to show.
>
> In your original message, you clain to have a bias against Postgres, and
> your benchmark shows that bias.  I however am a happy user of postgres,
> and am therefore biased towards it.  I modified your benchmark script
> slightly, and I got the following results (I have include a diff of my
> changes at the bottom):
>
> postgres
>  0 wallclock secs ( 0.02 usr + 0.01 sys = 0.03 CPU)
> postgres
>  0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
>
> Whereas if I run it with your version I get the following:
>
> postgres
> 27 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
> postgres
> 27 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)
>
> So what does that tell you about the benchmark?  that the postgres part of
> this benchmark is useless...  It may have given you the answer that you
> wanted, but it is misleading to anyone else out there.
>
> This is why there are always flame wars about benchmarking databases (by
> the way I think this whole thread has been very civilized and i hope is
> stays that way).  Invariably the benchmark has missed some critical idea
> or optimization which drastically skew the results.
>
> > Your recommendation is to pick a DB best suited to your app. But How ??
> > a) Either by hiring a guru who has seen all kinds of apps with different
DBs
> > who can give you the answer with which we can run
> > b) Run a benchmark on critical programs which represent you app across
> > databases and find what performs best.
> > I've read too much literature on DB features. All DBs have all features
> > (except MySQL which does not have commit !!!!)
> > You can't make a thing out of DB literature.
>
> What I would recommend is exactly what you have done in this case.  Get
> access to any and all the systems that you feel may do the job for you ,
> and try them out.  Browse the web for other users experiences, but don't
> use other peoples benchmarks, because the odds are good that they are
> wrong...  Create your own, or modify an existing one, and scrutinize
> exactly what it is doing.  And if you want to share your results with
> anyone else, tell them what you choose in the end, and why.  Tell them you
> choose database x because it did this and this for you.  Don't say
> database y is a piece of crap, so we went with database x.
>
> But whatever you do, don't choose your database based on other peoples
> benchmarks........ (that is all I'm trying to say, and I guess I didn't
> say it clearly enough)
>
> When I first read your message, I tucked it away somewhere, so I could
> reference it again in the future, because I was interested in the MLDBM
> work that you had done, and I thank you for that.  But it also made me
> think that maybe I shouldn't be using Postgres, because your results were
> so poor (only for a second or too though :).  But I'll bet that a lot of
> people who have never used postgres before are now less likely to download
> it and try it out for themself, because a benchmark swayed them away from
> it.  That sounds like a good closer, so I'll stop it there :-)
>
>
> Cees
>
>
> Here is the diff of my changes and a quick comment on why your way kills
> the performance of postgres
>
> ***************
> *** 124,131 ****
>                 $i_ip = int(rand(20));
>
>                 @row_ary = $dbh->selectrow_array("select crr from
benchmark where
> !                               rtrim(pub) = 'pub$i_pub' and rtrim(size) =
'size$i_size' and
> !                               rtrim(type) = 'type$i_type' and rtrim(ip)
= 'ip$i_ip'");
>         }
>   };
>
> --- 124,131 ----
>                 $i_ip = int(rand(20));
>
>                 @row_ary = $dbh->selectrow_array("select crr from
benchmark where
> !                               pub = 'pub$i_pub' and size = 'size$i_size'
and
> !                               type = 'type$i_type' and ip = 'ip$i_ip'");
>         }
>   };
>
> All I did was remove the 'rtrim' called from the query.  I'm assuming you
> did this because Postgres pads all char fields with spaces (varchar
> doesn't do this).  What your query ends up doing, is running rtrim on
> every field in the column and then comparing it against your provided
> value (indeces are not used since you are changing the value of the
> primary key).  So since your table has 24000 rows and you call rtrim on 4
> columns in the select, it has a worst case of 24,000 x 4 calls to rtrim
> (of course postgres will short circuit the conditions if the first call
> fails), but in the best case it still has to do 24,000 calls to rtrim.
> I'm amazed that postgres can actually do this this fast :).
>
> In Postgres 7.0 you don't have to worry about the space padding issue, so
> I was able to remove the rtrims from the query.  However since you are
> using 6.5.x, what you should have done, is either make your columns
> varchar instead of char, or space pad your values in the query, so that
> you are checking pub = 'pub1 ' instead of rtrim(pub) = 'pub1'.  i
>
> However, since you were looking at solutions for a new application, you
> have no excuse for using a database that is several years old, when
> several new stable revisions are available (for free)...
>
> Give it a try and see what happens...
>
>
>
>