You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Jan Horák <ho...@gmail.com> on 2010/04/06 08:19:26 UTC

Changes in SQL backend design

Hi, I've made some changes in SQL backend desing, updated design is at:
http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100405_mysql_design_v2.png

As Philipp M. proposed, Representations are now provided with an SHA1, 
but in a bit different way than in BDB. In BDB there is another table, 
which means an unnecessary joining in SQL.

Txt_id fiels were renamed to txn_id, that was only a mistake. Txnprop_id 
was removed and composed primary key <txn_id, name> is used instead.

Table names were kept in plural, even if Geoff R. supposed to use 
singular, but plurals will correspond with BDB better. But if anybody 
have good reasons, it's no problem to change it.

The last thing is the discussed storing of the representations in files, 
not in DB. I've made a simple test of the access speed to database and 
to pure filesystem, the report is here:
http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_access_speed_comparison.pdf

The conclusion is, that SQL can be faster when many small files are 
readed the first time (thanks better cache), in other cases the 
filesystem is faster, as supposed. But Greg S. had a very good point, 
that I agree with:

 > My gut says "not that much faster". In most scenarios, the network
 > bandwidth between the client/server will be the bottleneck. Reading
 > the data off a disk (rather than from a DB) is not going to make the
 > WAN connection any faster.
...
 > I'd go with the "store content in the database" until performance
 > fiures (or a DBA) demonstrates it is a problem.

Thus the data stay in DB right now.

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Dne 6.4.2010 10:54, Senthil Kumaran S napsal(a):
> Jan Horák wrote:
>> The last thing is the discussed storing of the representations in 
>> files, not in DB. I've made a simple test of the access speed to 
>> database and to pure filesystem, the report is here:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_access_speed_comparison.pdf 
>
>
> What is the unit of time here? Is it seconds, milliseconds, etc?
There are seconds always.
>
> Thank You.

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Dne 6.4.2010 10:54, Senthil Kumaran S napsal(a):
> Jan Horák wrote:
>> The last thing is the discussed storing of the representations in 
>> files, not in DB. I've made a simple test of the access speed to 
>> database and to pure filesystem, the report is here:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_access_speed_comparison.pdf 
>
>
> What is the unit of time here? Is it seconds, milliseconds, etc?
There are seconds always.
>
> Thank You.

Re: Changes in SQL backend design

Posted by Senthil Kumaran S <se...@collab.net>.

Jan Horák wrote:
> The last thing is the discussed storing of the representations in files, 
> not in DB. I've made a simple test of the access speed to database and 
> to pure filesystem, the report is here:
> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_access_speed_comparison.pdf 

What is the unit of time here? Is it seconds, milliseconds, etc?

Thank You.
-- 
Senthil Kumaran S
http://www.stylesen.org/

RE: Changes in SQL backend design

Posted by Bob Archer <Bo...@amsi.com>.

> I think it would be nice to store more repositories in one database, but
> I find this way a bit confusing and without significant advantages (e.g.
> it would be not clear which rows belong to which repository).

Please don't do this unless it is an option. Each repository should be self contained. It would also be nice if it were OS agnostic... so for example I could move a repository from one server to another (Windows to Mac) by just copying over a sqlite datbase. Even all the config info should be in the database rather than magic config files.

BOb

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

I think it would be nice to store more repositories in one database, but 
I find this way a bit confusing and without significant advantages (e.g. 
it would be not clear which rows belong to which repository).

But, some table prefix could be used by all tables, it would be 
transparent and easy to destroy the whole repository for example 
(without svn routine).

Jan

Dne 15.4.2010 14:30, Martin Furter napsal(a):
>
>
> On Tue, 6 Apr 2010, [ISO-8859-2] Jan Horák wrote:
>
>> Hi, I've made some changes in SQL backend desing, updated design is at:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100405_mysql_design_v2.png 
>>
>
> If you add a field "repository_name" to the transactions table it 
> would be possible to store multiple repositories in one database/schema.
>
> Martin

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

I think it would be nice to store more repositories in one database, but 
I find this way a bit confusing and without significant advantages (e.g. 
it would be not clear which rows belong to which repository).

But, some table prefix could be used by all tables, it would be 
transparent and easy to destroy the whole repository for example 
(without svn routine).

Jan

Dne 15.4.2010 14:30, Martin Furter napsal(a):
>
>
> On Tue, 6 Apr 2010, [ISO-8859-2] Jan Horák wrote:
>
>> Hi, I've made some changes in SQL backend desing, updated design is at:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100405_mysql_design_v2.png 
>>
>
> If you add a field "repository_name" to the transactions table it 
> would be possible to store multiple repositories in one database/schema.
>
> Martin

Re: Changes in SQL backend design

Posted by Martin Furter <mf...@rola.ch>.

On Tue, 6 Apr 2010, [ISO-8859-2] Jan Horák wrote:

> Hi, I've made some changes in SQL backend desing, updated design is at:
> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100405_mysql_design_v2.png

If you add a field "repository_name" to the transactions table it would be 
possible to store multiple repositories in one database/schema.

Martin

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Hello Phil,

Dne 12.4.2010 8:02, Philipp Marek napsal(a):
> Sorry for the delay; but reading the thread "Severe performance issues with
> large directories" I just remembered that the backend has a little bit of a
> problem with big directories - storage overhead.
>
> Do you see any way to split directories into a series of blocks (like files
> are done), and when changing only a few of the files using pointers to the
> unmodified blocks of the old directory?
>
> I don't propose a real delta design - that was too slow, IIRC.
> Just re-use of directory blocks; that shouldn't bring any performance issues.
>
>
> Is there some way to do that? Perhaps multiple "." entries in a directory,
> which just point to other parts?
>

I'm not sure if we think the same issue, but I was thinking about a kind 
of hash table. Using
a sophisticated table size it could bring good results, supposedly.

Cheers,

Jan

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Dne 16.4.2010 00:04, Greg Stein napsal(a):
>
> I'm away from my laptop, so please excuse the brevity...
>
> Yes, we can set you up with a branch. You will need to sign an ASF 
> CLA. Then we can go from there. I'll send more details later (unless 
> somebody does it first, or you find the CLA).
>
Thanks, it doesn't hurry, I will be some days off now.

Regards,

Jan
>> On Apr 15, 2010 5:58 PM, "Jan Horák" <horak.honza@gmail.com 
>> <ma...@gmail.com>> wrote:
>>
>>
>> Hi,
>>
>> Dne 14.4.2010 18:08, Greg Stein napsal(a):
>>
>>
>> >
>> > 2010/4/14 Philipp Marek<philipp.marek@emerion.com 
>> <ma...@emerion.com>>:
>> >
>> >>
>> >> Hello Jan!
>> >>
>> >> On Dienstag, 13...
>>
>> First, sorry about my delays in answering, just too busy at the moment.
>>
>> On the one hand I like this solution, I find it clear and useful. But 
>> I agree with Greg on the other hand and I would be glad if some 
>> working prototype of SQL backend will become real in the following 
>> weeks/months. So I would rather not to complicate the present design 
>> and keep this idea to the future extending.
>>
>> It brings me to another point, I would like to begin to implement a 
>> prototype soon, so if it would be possible to create some devel. 
>> branch for that purpose, it would be great. Or is there anybody to 
>> ask for that directly?
>>
>> Regards,
>>
>> Jan
>>
>>
>>
>> > Or (3): go ahead and store megabytes for each directory, just like the
>> > other backends. And lea...
>>

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Dne 16.4.2010 00:04, Greg Stein napsal(a):
>
> I'm away from my laptop, so please excuse the brevity...
>
> Yes, we can set you up with a branch. You will need to sign an ASF 
> CLA. Then we can go from there. I'll send more details later (unless 
> somebody does it first, or you find the CLA).
>
Thanks, it doesn't hurry, I will be some days off now.

Regards,

Jan
>> On Apr 15, 2010 5:58 PM, "Jan Horák" <horak.honza@gmail.com 
>> <ma...@gmail.com>> wrote:
>>
>>
>> Hi,
>>
>> Dne 14.4.2010 18:08, Greg Stein napsal(a):
>>
>>
>> >
>> > 2010/4/14 Philipp Marek<philipp.marek@emerion.com 
>> <ma...@emerion.com>>:
>> >
>> >>
>> >> Hello Jan!
>> >>
>> >> On Dienstag, 13...
>>
>> First, sorry about my delays in answering, just too busy at the moment.
>>
>> On the one hand I like this solution, I find it clear and useful. But 
>> I agree with Greg on the other hand and I would be glad if some 
>> working prototype of SQL backend will become real in the following 
>> weeks/months. So I would rather not to complicate the present design 
>> and keep this idea to the future extending.
>>
>> It brings me to another point, I would like to begin to implement a 
>> prototype soon, so if it would be possible to create some devel. 
>> branch for that purpose, it would be great. Or is there anybody to 
>> ask for that directly?
>>
>> Regards,
>>
>> Jan
>>
>>
>>
>> > Or (3): go ahead and store megabytes for each directory, just like the
>> > other backends. And lea...
>>

Re: Changes in SQL backend design

Posted by Greg Stein <gs...@gmail.com>.

I'm away from my laptop, so please excuse the brevity...

Yes, we can set you up with a branch. You will need to sign an ASF CLA. Then
we can go from there. I'll send more details later (unless somebody does it
first, or you find the CLA).

On Apr 15, 2010 5:58 PM, "Jan Horák" <ho...@gmail.com> wrote:

Hi,

Dne 14.4.2010 18:08, Greg Stein napsal(a):

>
> 2010/4/14 Philipp Marek<ph...@emerion.com>:
>
>>
>> Hello Jan!
>>
>> On Dienstag, 13...
First, sorry about my delays in answering, just too busy at the moment.

On the one hand I like this solution, I find it clear and useful. But I
agree with Greg on the other hand and I would be glad if some working
prototype of SQL backend will become real in the following weeks/months. So
I would rather not to complicate the present design and keep this idea to
the future extending.

It brings me to another point, I would like to begin to implement a
prototype soon, so if it would be possible to create some devel. branch for
that purpose, it would be great. Or is there anybody to ask for that
directly?

Regards,

Jan

> Or (3): go ahead and store megabytes for each directory, just like the
> other backends. And lea...

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Hi,

Dne 14.4.2010 18:08, Greg Stein napsal(a):
> 2010/4/14 Philipp Marek<ph...@emerion.com>:
>    
>> Hello Jan!
>>
>> On Dienstag, 13. April 2010, Jan Horák wrote:
>>      
>>> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
>>>        
>>>> Sorry for the delay; but reading the thread "Severe performance issues
>>>> with large directories" I just remembered that the backend has a little
>>>> bit of a problem with big directories - storage overhead.
>>>>
>>>> Do you see any way to split directories into a series of blocks (like
>>>> files are done), and when changing only a few of the files using pointers
>>>> to the unmodified blocks of the old directory?
>>>>
>>>> I don't propose a real delta design - that was too slow, IIRC.
>>>> Just re-use of directory blocks; that shouldn't bring any performance
>>>> issues.
>>>>
>>>>
>>>> Is there some way to do that? Perhaps multiple "." entries in a
>>>> directory, which just point to other parts?
>>>>          
>>> I'm not sure if we think the same issue, but I was thinking about a kind
>>> of hash table. Using
>>> a sophisticated table size it could bring good results, supposedly.
>>>        
>> Sorry, I didn't make myself clear.
>>
>> I didn't find the issue I'm talking about in the issue tracker; but the
>> problem is that the backends (FSFS, BDB) don't store directories deltified
>> (for performance reasons), and so modifying an entry in or below a big
>> directory has to re-write the whole directory - and that means several
>> megabytes, for big directories.
>>
>>
>> So I'd suggest to change the directory storage.
>> * Either use a new table, with fields like parent, name (or path),
>>   valid-from-revision, valid-before-revision or something like that;
>>   then changing an entry means only updating valid-before of the
>>   old record, and inserting a new one.
>> * Or, if you want to store directories in the same way as file data (like
>>   now in FSFS and BDB), I'd suggest to limit such blocks of directory data
>>   to a few KB, but to define an indirect-block that tells which blocks are
>>   used.
>>   A new entry could then reference all the unchanged blocks of the older
>>   revision.
>>      
First, sorry about my delays in answering, just too busy at the moment.

On the one hand I like this solution, I find it clear and useful. But I 
agree with Greg on the other hand and I would be glad if some working 
prototype of SQL backend will become real in the following weeks/months. 
So I would rather not to complicate the present design and keep this 
idea to the future extending.

It brings me to another point, I would like to begin to implement a 
prototype soon, so if it would be possible to create some devel. branch 
for that purpose, it would be great. Or is there anybody to ask for that 
directly?

Regards,

Jan

> Or (3): go ahead and store megabytes for each directory, just like the
> other backends. And leave the solution of this problem to a future
> iteration of the SQL-based backend.
>
> Really... optimizing before you even get started is not advisable. Get
> something done. THEN examine and iterate. There could be numerous
> other problems inherent in a SQL backend that would obviate any such
> "solution" proposed today.
>
> Also, the "SQL backend" concept has been started several times before,
> and abandoned. I don't want to see it get abandoned AGAIN because the
> initial "solutions" make it overly complicated before it can even
> begin.
>
> Cheers,
> -g
>

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Hi,

Dne 14.4.2010 18:08, Greg Stein napsal(a):
> 2010/4/14 Philipp Marek<ph...@emerion.com>:
>    
>> Hello Jan!
>>
>> On Dienstag, 13. April 2010, Jan Horák wrote:
>>      
>>> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
>>>        
>>>> Sorry for the delay; but reading the thread "Severe performance issues
>>>> with large directories" I just remembered that the backend has a little
>>>> bit of a problem with big directories - storage overhead.
>>>>
>>>> Do you see any way to split directories into a series of blocks (like
>>>> files are done), and when changing only a few of the files using pointers
>>>> to the unmodified blocks of the old directory?
>>>>
>>>> I don't propose a real delta design - that was too slow, IIRC.
>>>> Just re-use of directory blocks; that shouldn't bring any performance
>>>> issues.
>>>>
>>>>
>>>> Is there some way to do that? Perhaps multiple "." entries in a
>>>> directory, which just point to other parts?
>>>>          
>>> I'm not sure if we think the same issue, but I was thinking about a kind
>>> of hash table. Using
>>> a sophisticated table size it could bring good results, supposedly.
>>>        
>> Sorry, I didn't make myself clear.
>>
>> I didn't find the issue I'm talking about in the issue tracker; but the
>> problem is that the backends (FSFS, BDB) don't store directories deltified
>> (for performance reasons), and so modifying an entry in or below a big
>> directory has to re-write the whole directory - and that means several
>> megabytes, for big directories.
>>
>>
>> So I'd suggest to change the directory storage.
>> * Either use a new table, with fields like parent, name (or path),
>>   valid-from-revision, valid-before-revision or something like that;
>>   then changing an entry means only updating valid-before of the
>>   old record, and inserting a new one.
>> * Or, if you want to store directories in the same way as file data (like
>>   now in FSFS and BDB), I'd suggest to limit such blocks of directory data
>>   to a few KB, but to define an indirect-block that tells which blocks are
>>   used.
>>   A new entry could then reference all the unchanged blocks of the older
>>   revision.
>>      
First, sorry about my delays in answering, just too busy at the moment.

On the one hand I like this solution, I find it clear and useful. But I 
agree with Greg on the other hand and I would be glad if some working 
prototype of SQL backend will become real in the following weeks/months. 
So I would rather not to complicate the present design and keep this 
idea to the future extending.

It brings me to another point, I would like to begin to implement a 
prototype soon, so if it would be possible to create some devel. branch 
for that purpose, it would be great. Or is there anybody to ask for that 
directly?

Regards,

Jan

> Or (3): go ahead and store megabytes for each directory, just like the
> other backends. And leave the solution of this problem to a future
> iteration of the SQL-based backend.
>
> Really... optimizing before you even get started is not advisable. Get
> something done. THEN examine and iterate. There could be numerous
> other problems inherent in a SQL backend that would obviate any such
> "solution" proposed today.
>
> Also, the "SQL backend" concept has been started several times before,
> and abandoned. I don't want to see it get abandoned AGAIN because the
> initial "solutions" make it overly complicated before it can even
> begin.
>
> Cheers,
> -g
>

Re: Changes in SQL backend design

Posted by Greg Stein <gs...@gmail.com>.

2010/4/14 Philipp Marek <ph...@emerion.com>:
> Hello Jan!
>
> On Dienstag, 13. April 2010, Jan Horák wrote:
>> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
>> > Sorry for the delay; but reading the thread "Severe performance issues
>> > with large directories" I just remembered that the backend has a little
>> > bit of a problem with big directories - storage overhead.
>> >
>> > Do you see any way to split directories into a series of blocks (like
>> > files are done), and when changing only a few of the files using pointers
>> > to the unmodified blocks of the old directory?
>> >
>> > I don't propose a real delta design - that was too slow, IIRC.
>> > Just re-use of directory blocks; that shouldn't bring any performance
>> > issues.
>> >
>> >
>> > Is there some way to do that? Perhaps multiple "." entries in a
>> > directory, which just point to other parts?
>> I'm not sure if we think the same issue, but I was thinking about a kind
>> of hash table. Using
>> a sophisticated table size it could bring good results, supposedly.
> Sorry, I didn't make myself clear.
>
> I didn't find the issue I'm talking about in the issue tracker; but the
> problem is that the backends (FSFS, BDB) don't store directories deltified
> (for performance reasons), and so modifying an entry in or below a big
> directory has to re-write the whole directory - and that means several
> megabytes, for big directories.
>
>
> So I'd suggest to change the directory storage.
> * Either use a new table, with fields like parent, name (or path),
>  valid-from-revision, valid-before-revision or something like that;
>  then changing an entry means only updating valid-before of the
>  old record, and inserting a new one.
> * Or, if you want to store directories in the same way as file data (like
>  now in FSFS and BDB), I'd suggest to limit such blocks of directory data
>  to a few KB, but to define an indirect-block that tells which blocks are
>  used.
>  A new entry could then reference all the unchanged blocks of the older
>  revision.


Or (3): go ahead and store megabytes for each directory, just like the
other backends. And leave the solution of this problem to a future
iteration of the SQL-based backend.

Really... optimizing before you even get started is not advisable. Get
something done. THEN examine and iterate. There could be numerous
other problems inherent in a SQL backend that would obviate any such
"solution" proposed today.

Also, the "SQL backend" concept has been started several times before,
and abandoned. I don't want to see it get abandoned AGAIN because the
initial "solutions" make it overly complicated before it can even
begin.

Cheers,
-g

Re: Changes in SQL backend design

Posted by Philipp Marek <ph...@emerion.com>.

Hello Jan!

On Dienstag, 13. April 2010, Jan Horák wrote:
> Dne 12.4.2010 8:02, Philipp Marek napsal(a):
> > Sorry for the delay; but reading the thread "Severe performance issues
> > with large directories" I just remembered that the backend has a little
> > bit of a problem with big directories - storage overhead.
> >
> > Do you see any way to split directories into a series of blocks (like
> > files are done), and when changing only a few of the files using pointers
> > to the unmodified blocks of the old directory?
> >
> > I don't propose a real delta design - that was too slow, IIRC.
> > Just re-use of directory blocks; that shouldn't bring any performance
> > issues.
> >
> >
> > Is there some way to do that? Perhaps multiple "." entries in a
> > directory, which just point to other parts?
> I'm not sure if we think the same issue, but I was thinking about a kind
> of hash table. Using
> a sophisticated table size it could bring good results, supposedly.
Sorry, I didn't make myself clear.

I didn't find the issue I'm talking about in the issue tracker; but the 
problem is that the backends (FSFS, BDB) don't store directories deltified 
(for performance reasons), and so modifying an entry in or below a big 
directory has to re-write the whole directory - and that means several 
megabytes, for big directories.

So I'd suggest to change the directory storage.
* Either use a new table, with fields like parent, name (or path),
  valid-from-revision, valid-before-revision or something like that;
  then changing an entry means only updating valid-before of the 
  old record, and inserting a new one.
* Or, if you want to store directories in the same way as file data (like
  now in FSFS and BDB), I'd suggest to limit such blocks of directory data
  to a few KB, but to define an indirect-block that tells which blocks are
  used.
  A new entry could then reference all the unchanged blocks of the older
  revision.

I hope that this explains it a bit better.

Regards,

Phil

Re: Changes in SQL backend design

Posted by Philipp Marek <ph...@emerion.com>.

Hello Jan,

Sorry for the delay; but reading the thread "Severe performance issues with 
large directories" I just remembered that the backend has a little bit of a 
problem with big directories - storage overhead.

Do you see any way to split directories into a series of blocks (like files 
are done), and when changing only a few of the files using pointers to the 
unmodified blocks of the old directory?

I don't propose a real delta design - that was too slow, IIRC.
Just re-use of directory blocks; that shouldn't bring any performance issues.


Is there some way to do that? Perhaps multiple "." entries in a directory, 
which just point to other parts?


Regards,

Phil

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Dne 6.4.2010 10:50, Philipp Marek napsal(a):
> Hello Jan!
>
> On Dienstag, 6. April 2010, Jan Horák wrote:
>    
>> Hi, I've made some changes in SQL backend desing, updated design is at:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100405_mysql_des
>> ign_v2.png
>>      
> ...
>    
>> The last thing is the discussed storing of the representations in files,
>> not in DB. I've made a simple test of the access speed to database and
>> to pure filesystem, the report is here:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_acce
>> ss_speed_comparison.pdf
>>      
> Fine, thank you very much.
>
>    
>> The conclusion is, that SQL can be faster when many small files are
>> readed the first time (thanks better cache), in other cases the
>> filesystem is faster, as supposed. But Greg S. had a very good point,
>>
>> that I agree with:
>>   >  My gut says "not that much faster". In most scenarios, the network
>>   >  bandwidth between the client/server will be the bottleneck. Reading
>>   >  the data off a disk (rather than from a DB) is not going to make the
>>   >  WAN connection any faster.
>> ...
>>   >  I'd go with the "store content in the database" until performance
>>   >  fiures (or a DBA) demonstrates it is a problem.
>>
>> Thus the data stay in DB right now.
>>      
> I'm fine with that.
>
> Just a minor nit: how about allowing a mixed design (later, optional, when the
> backend is running)? Ie keep blocks smaller than N in the database, but write
> larger ones to the filesystem? Or provide different paths depending on the
> block size?
>
> Then people with SSDs could use them for the small blocks (or just keep them
> in the database, as before), but larger data entities could be read from disk
> directly.
>
> That would probably make some sense, as small blocks don't matter if they're
> travelling across a few pipes; but for multi-MB data blocks that should be
> avoided.
>
>    

This is interesting idea and this approach seems to be the fastest from 
my perspective.

On the other hand it would be quite complicated and non-complex. I would 
rather wait until some SQL
backend prototype will exist, then try all possible alternatives and 
choose the best one.

Regards

Jan Horak

Re: Changes in SQL backend design

Posted by Jan Horák <ho...@gmail.com>.

Dne 6.4.2010 10:50, Philipp Marek napsal(a):
> Hello Jan!
>
> On Dienstag, 6. April 2010, Jan Horák wrote:
>    
>> Hi, I've made some changes in SQL backend desing, updated design is at:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100405_mysql_des
>> ign_v2.png
>>      
> ...
>    
>> The last thing is the discussed storing of the representations in files,
>> not in DB. I've made a simple test of the access speed to database and
>> to pure filesystem, the report is here:
>> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_acce
>> ss_speed_comparison.pdf
>>      
> Fine, thank you very much.
>
>    
>> The conclusion is, that SQL can be faster when many small files are
>> readed the first time (thanks better cache), in other cases the
>> filesystem is faster, as supposed. But Greg S. had a very good point,
>>
>> that I agree with:
>>   >  My gut says "not that much faster". In most scenarios, the network
>>   >  bandwidth between the client/server will be the bottleneck. Reading
>>   >  the data off a disk (rather than from a DB) is not going to make the
>>   >  WAN connection any faster.
>> ...
>>   >  I'd go with the "store content in the database" until performance
>>   >  fiures (or a DBA) demonstrates it is a problem.
>>
>> Thus the data stay in DB right now.
>>      
> I'm fine with that.
>
> Just a minor nit: how about allowing a mixed design (later, optional, when the
> backend is running)? Ie keep blocks smaller than N in the database, but write
> larger ones to the filesystem? Or provide different paths depending on the
> block size?
>
> Then people with SSDs could use them for the small blocks (or just keep them
> in the database, as before), but larger data entities could be read from disk
> directly.
>
> That would probably make some sense, as small blocks don't matter if they're
> travelling across a few pipes; but for multi-MB data blocks that should be
> avoided.
>
>    

This is interesting idea and this approach seems to be the fastest from 
my perspective.

On the other hand it would be quite complicated and non-complex. I would 
rather wait until some SQL
backend prototype will exist, then try all possible alternatives and 
choose the best one.

Regards

Jan Horak

Re: Changes in SQL backend design

Posted by Philipp Marek <ph...@emerion.com>.

Hello Jan!

On Dienstag, 6. April 2010, Jan Horák wrote:
> Hi, I've made some changes in SQL backend desing, updated design is at:
> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100405_mysql_des
> ign_v2.png
...
> The last thing is the discussed storing of the representations in files,
> not in DB. I've made a simple test of the access speed to database and
> to pure filesystem, the report is here:
> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_acce
> ss_speed_comparison.pdf
Fine, thank you very much.

> The conclusion is, that SQL can be faster when many small files are
> readed the first time (thanks better cache), in other cases the
> filesystem is faster, as supposed. But Greg S. had a very good point,
> 
> that I agree with:
>  > My gut says "not that much faster". In most scenarios, the network
>  > bandwidth between the client/server will be the bottleneck. Reading
>  > the data off a disk (rather than from a DB) is not going to make the
>  > WAN connection any faster.
> ...
>  > I'd go with the "store content in the database" until performance
>  > fiures (or a DBA) demonstrates it is a problem.
> 
> Thus the data stay in DB right now.
I'm fine with that.

Just a minor nit: how about allowing a mixed design (later, optional, when the 
backend is running)? Ie keep blocks smaller than N in the database, but write 
larger ones to the filesystem? Or provide different paths depending on the 
block size?

Then people with SSDs could use them for the small blocks (or just keep them 
in the database, as before), but larger data entities could be read from disk 
directly.

That would probably make some sense, as small blocks don't matter if they're 
travelling across a few pipes; but for multi-MB data blocks that should be 
avoided.

Thank you very much!

Regards,

Phil

PS: I'd like to know which filesystems you did test, BTW ;-)

Re: Changes in SQL backend design

Posted by Senthil Kumaran S <se...@collab.net>.

Jan Horák wrote:
> The last thing is the discussed storing of the representations in files, 
> not in DB. I've made a simple test of the access speed to database and 
> to pure filesystem, the report is here:
> http://www.stud.fit.vutbr.cz/~xhorak50/diplomathesis/files/100406_data_access_speed_comparison.pdf 

What is the unit of time here? Is it seconds, milliseconds, etc?

Thank You.
-- 
Senthil Kumaran S
http://www.stylesen.org/