You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by James G Smith <JG...@TAMU.Edu> on 2002/08/21 19:04:53 UTC

Re: Apache::Session - What goes in session?

"Jesse Erlbaum" <je...@erlbaum.net> wrote:
>Hi Peter --
>
>> > The morale of the story:  Flat files rock!  ;-)
>>
>> If I'm using Apache::DBI so I have a persistent connection to MySQL,
>> would it not be faster to simply use a table in MySQL?
>
>
>Unlikely.  Even with cached database connections you are probably not going
>to beat the performance of going to a flat text file.  Accessing files is
>something the OS is optimized to do.  The process of issuing a SQL query,
>having it parsed and retrieving results is probably more time-consuming than
>you think.

All depends on the file structure.  A linear search through a
thousand records can be slower than a binary search through a million
(500 ave. compares vs. about 20 max [10 ave.] compares - hope the
extra overhead for the binary search is worth the savings in
comparisons).

>One way to think about it is this:  MySQL stores its data in files.  There
>are many layers of code between DBI and those files, each of which add
>processing time.  Going directly to files is far less code, and less code is
>most often faster code.

MySQL also stores indices.  As soon as you start having to store data
in files and maintain indices, you might as well start using a
database.

>The best way to be cure is to benchmark the difference yourself.  Try out
>the Benchmark module.  Quantitative data trumps anecdotal data every time.

Definitely.  But before you do, make sure the proper indices are
created on the MySQL side.  Wrong database configurations can kill
any performance gain.
-- 
James Smith <JG...@TAMU.Edu>, 979-862-3725
Texas A&M CIS Operating Systems Group, Unix

Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
Lupe Christoph wrote:

>On Friday, 2002-08-23 at 11:28:10 +0200, lupe wrote:
>  
>
>>... A few years ago, I
>>benchmarked Apache::Session's diverse session stores, and MySQL
>>came out top. (I can't find my scripts for that anymopre, alas.)
>>    
>>
>
>Found it. The script is attached. Here are the results I had on Solaris
>x86 (probably 2.6 at that time). IIRC, MySQL with semaphore lock did not
>work. Maybe it does now. I did this on 24 July 2000.
>

That's an awfully slow file system on Solaris!  My benchmarks were all 
on Linux.  Maybe Solaris is doing synchronous writes in this test.  I 
suspect Postgres would do a lot better these days than it did in this test.

I'll try to put my TPC presentation and benchmarks up this weekend.

- Perrin


Re: Apache::Session - What goes in session?

Posted by Lupe Christoph <lu...@lupe-christoph.de>.
On Friday, 2002-08-23 at 11:28:10 +0200, lupe wrote:
> ... A few years ago, I
> benchmarked Apache::Session's diverse session stores, and MySQL
> came out top. (I can't find my scripts for that anymopre, alas.)

Found it. The script is attached. Here are the results I had on Solaris
x86 (probably 2.6 at that time). IIRC, MySQL with semaphore lock did not
work. Maybe it does now. I did this on 24 July 2000.

Benchmark: timing 1000 iterations of Create DB_File, Create DB_File Sema, Create File, Create File Sema, Create MySQL, Create MySQL Onehandle, Create PostgreSQL...
Create DB_File: 54 wallclock secs ( 2.14 usr +  2.55 sys =  4.69 CPU)
Create DB_File Sema: 24 wallclock secs ( 2.14 usr +  2.01 sys =  4.15 CPU)
Create File: 45 wallclock secs ( 1.83 usr +  5.69 sys =  7.52 CPU)
Create File Sema: 28 wallclock secs ( 2.09 usr +  4.42 sys =  6.51 CPU)
Create MySQL: 21 wallclock secs ( 6.63 usr +  2.87 sys =  9.50 CPU)
Create MySQL Onehandle:  6 wallclock secs ( 2.23 usr +  0.68 sys =  2.91 CPU)
Create PostgreSQL: 185 wallclock secs ( 5.37 usr +  1.91 sys =  7.28 CPU)
----------------------------------------------------------------

Benchmark: timing 1000 iterations of Retrieve DB_File, Retrieve DB_File Sema, Retrieve File, Retrieve File Sema, Retrieve MySQL, Retrieve MySQL Onehandle, Retrieve PostgreSQL...
Retrieve DB_File: 12 wallclock secs ( 1.64 usr +  1.60 sys =  3.24 CPU)
Retrieve DB_File Sema: 11 wallclock secs ( 1.74 usr +  1.75 sys =  3.49 CPU)
Retrieve File:  4 wallclock secs ( 2.99 usr +  0.56 sys =  3.55 CPU)
Retrieve File Sema:  4 wallclock secs ( 3.08 usr +  0.43 sys =  3.51 CPU)
Retrieve MySQL: 19 wallclock secs ( 6.50 usr +  3.04 sys =  9.54 CPU)
Retrieve MySQL Onehandle:  6 wallclock secs ( 2.66 usr +  0.81 sys =  3.47 CPU)
Retrieve PostgreSQL: 182 wallclock secs ( 4.60 usr +  2.02 sys =  6.62 CPU)
----------------------------------------------------------------

Benchmark: timing 1000 iterations of Store DB_File, Store DB_File Sema, Store File, Store File Sema, Store MySQL, Store MySQL Onehandle, Store PostgreSQL...
Store DB_File: 25 wallclock secs ( 2.15 usr +  2.23 sys =  4.38 CPU)
Store DB_File Sema: 23 wallclock secs ( 2.47 usr +  2.14 sys =  4.61 CPU)
Store File: 13 wallclock secs ( 5.40 usr +  1.03 sys =  6.43 CPU)
Store File Sema: 13 wallclock secs ( 5.53 usr +  0.96 sys =  6.49 CPU)
Store MySQL: 21 wallclock secs ( 7.39 usr +  3.05 sys = 10.44 CPU)
Store MySQL Onehandle:  8 wallclock secs ( 2.83 usr +  0.88 sys =  3.71 CPU)
Store PostgreSQL: 197 wallclock secs ( 6.67 usr +  2.26 sys =  8.93 CPU)
----------------------------------------------------------------

Benchmark: timing 1000 iterations of Load DB_File, Load DB_File Sema, Load File, Load File Sema, Load MySQL, Load MySQL Onehandle, Load PostgreSQL...
Load DB_File: 12 wallclock secs ( 1.57 usr +  1.95 sys =  3.52 CPU)
Load DB_File Sema: 12 wallclock secs ( 1.75 usr +  1.87 sys =  3.62 CPU)
 Load File:  5 wallclock secs ( 2.86 usr +  0.62 sys =  3.48 CPU)
Load File Sema:  4 wallclock secs ( 3.10 usr +  0.48 sys =  3.58 CPU)
Load MySQL: 20 wallclock secs ( 6.35 usr +  2.86 sys =  9.21 CPU)
Load MySQL Onehandle:  6 wallclock secs ( 2.25 usr +  0.61 sys =  2.86 CPU)
Load PostgreSQL: 179 wallclock secs ( 5.73 usr +  1.87 sys =  7.60 CPU)
----------------------------------------------------------------

Benchmark: timing 1000 iterations of Update DB_File, Update DB_File Sema, Update File, Update File Sema, Update MySQL, Update MySQL Onehandle, Update PostgreSQL...
Update DB_File: 23 wallclock secs ( 2.15 usr +  2.14 sys =  4.29 CPU)
Update DB_File Sema: 22 wallclock secs ( 2.28 usr +  2.20 sys =  4.48 CPU)
Update File: 11 wallclock secs ( 5.16 usr +  1.04 sys =  6.20 CPU)
Update File Sema: 11 wallclock secs ( 5.08 usr +  1.04 sys =  6.12 CPU)
Update MySQL: 23 wallclock secs ( 7.20 usr +  3.27 sys = 10.47 CPU)
Update MySQL Onehandle:  8 wallclock secs ( 2.86 usr +  1.02 sys =  3.88 CPU)
Update PostgreSQL: 202 wallclock secs ( 6.09 usr +  2.19 sys =  8.28 CPU)
----------------------------------------------------------------

Benchmark: timing 1000 iterations of Store4 DB_File, Store4 DB_File Sema, Store4 File, Store4 File Sema, Store4 MySQL, Store4 MySQL Onehandle, Store4 PostgreSQL...
Store4 DB_File: 25 wallclock secs ( 2.27 usr +  3.04 sys =  5.31 CPU)
Store4 DB_File Sema: 24 wallclock secs ( 2.35 usr +  3.38 sys =  5.73 CPU)
Store4 File: 22 wallclock secs ( 5.37 usr +  1.07 sys =  6.44 CPU)
Store4 File Sema: 22 wallclock secs ( 5.61 usr +  0.99 sys =  6.60 CPU)
Store4 MySQL: 21 wallclock secs ( 7.72 usr +  3.12 sys = 10.84 CPU)
Store4 MySQL Onehandle:  8 wallclock secs ( 3.08 usr +  0.99 sys =  4.07 CPU)
Store4 PostgreSQL: 195 wallclock secs ( 6.22 usr +  1.85 sys =  8.07 CPU)
----------------------------------------------------------------

Benchmark: timing 1000 iterations of Update4 DB_File, Update4 DB_File Sema, Update4 File, Update4 File Sema, Update4 MySQL, Update4 MySQL Onehandle, Update4 PostgreSQL...
Update4 DB_File: 22 wallclock secs ( 2.27 usr +  2.97 sys =  5.24 CPU)
Update4 DB_File Sema: 22 wallclock secs ( 2.44 usr +  3.27 sys =  5.71 CPU)
Update4 File: 11 wallclock secs ( 5.32 usr +  0.91 sys =  6.23 CPU)
Update4 File Sema: 12 wallclock secs ( 5.76 usr +  0.85 sys =  6.61 CPU)
Update4 MySQL: 23 wallclock secs ( 7.61 usr +  3.21 sys = 10.82 CPU)
Update4 MySQL Onehandle:  8 wallclock secs ( 3.36 usr +  0.75 sys =  4.11 CPU)
Update4 PostgreSQL: 199 wallclock secs ( 6.31 usr +  2.13 sys =  8.44 CPU)

HTH,
Lupe Christoph
-- 
| lupe@lupe-christoph.de       |           http://www.lupe-christoph.de/ |
| Big Misunderstandings #6398: The Titanic was not supposed to be        |
| unsinkable. The designer had a speech impediment. He said: "I have     |
| thith great unthinkable conthept ..."                                  |

Re: Apache::Session - What goes in session?

Posted by Lupe Christoph <lu...@lupe-christoph.de>.
On Thursday, 2002-08-22 at 14:39:51 -0400, Perrin Harkins wrote:
> James G Smith wrote:
> >MySQL also stores indices.  As soon as you start having to store data
> >in files and maintain indices, you might as well start using a
> >database.

> The filesystem and BerkeleyDB both use BTree indices to find the record 
> you want.  MySQL indices are not faster.  A database like MySQL would be 
> the clear choice if you had something which required queries more 
> complex than fetching a single record by ID, but we're talking about 
> sessions and caches here.  These are essentially hash structures, and 
> they are exactly what filesystems and BerkeleyDB are designed to be good at.

Many filesystems don't use anything better than unsorted lists for
directories. And they use synchronous operations for creation and
removal of files to guarantee consistency (ext2 doesn't). I've just
timed the creation of 10000 files on a Sun E250, local 10000RPM disk.
It took 3:30 minutes. I'd bet creating 10000 records in MySQL takes
less time (but I'm too lazy to benchmark). A few years ago, I
benchmarked Apache::Session's diverse session stores, and MySQL
came out top. (I can't find my scripts for that anymopre, alas.)

Lupe Christoph
-- 
| lupe@lupe-christoph.de       |           http://www.lupe-christoph.de/ |
| Big Misunderstandings #6398: The Titanic was not supposed to be        |
| unsinkable. The designer had a speech impediment. He said: "I have     |
| thith great unthinkable conthept ..."                                  |

Re: Apache::Session - What goes in session?

Posted by Perrin Harkins <pe...@elem.com>.
James G Smith wrote:
> MySQL also stores indices.  As soon as you start having to store data
> in files and maintain indices, you might as well start using a
> database.

The filesystem and BerkeleyDB both use BTree indices to find the record 
you want.  MySQL indices are not faster.  A database like MySQL would be 
the clear choice if you had something which required queries more 
complex than fetching a single record by ID, but we're talking about 
sessions and caches here.  These are essentially hash structures, and 
they are exactly what filesystems and BerkeleyDB are designed to be good at.

- Perrin


RE: Apache::Session - What goes in session?

Posted by Jesse Erlbaum <je...@erlbaum.net>.
Hey James --

> >One way to think about it is this:  MySQL stores its data in
> files.  There
> >are many layers of code between DBI and those files, each of which add
> >processing time.  Going directly to files is far less code, and
> less code is
> >most often faster code.
>
> MySQL also stores indices.  As soon as you start having to store data
> in files and maintain indices, you might as well start using a
> database.


You bring up a really important point: Scale.  If a custom file-based data
storage system starts growing in both size and functionality it will sooner
or latter reach a point where it is a far worse solution.

Relational databases are optimized for two things:  Ease of access and
management of large data sets.  If the data set is small and the functional
requirements are very narrow then a custom system can outperform a database
most of the time (not including poorly written systems!).  Once you have to
deal with large amounts of data, or you need to have an interface which
allows customizable retrieval of sub-sets of data (a la SQL), a database is
going to be the way to go.

The trick is knowing which path to choose.  Having an idea of the potential
growth of the system and use of the data, combined with a few well chosen
benchmarks come in handy here.

TTYL,

-Jesse-


--

  Jesse Erlbaum
  The Erlbaum Group
  jesse@erlbaum.net
  Phone: 212-684-6161
  Fax: 212-684-6226