You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Edmund Horner <ej...@paradise.net.nz> on 2005/06/10 04:05:39 UTC

(SoC) SQL storage option for libsvn_fs_base

Hello everyone.

You might recall my on-again-off-again interest in storing Subversion
data in SQL; well, I've begun to think about it again and come up with a
new plan for doing it.

I'm going to submit this application to Google's Summer of Code, but I
thought I'd air it here first in case there are obvious things I'm
forgetting about.

Please comment by 14 June. :-)



Re: (SoC) SQL storage option for libsvn_fs_base

Posted by John Peacock <jp...@rowman.com>.
Charles Bailey wrote:
> Granting that the details of connecting to the DB and SQL extensions
> differ, it'd be nice if a prototypical SQL backend tried to stay as
> general as possible (e.g. sequences are available in most
> commonly-used RDBs, but table inheritance isn't), to make it more
> possible to port the result from Postgres to, say, MySQL.

That's part of avoiding premature optimization.  Any initial SQL backend 
should have a very-low-level layer which deals with actually 
communicating with the database and a slightly-higher-level layer which 
performs as generic a SQL query as possible (no proprietary extensions). 
  Only when the first pass is complete will it [hopefully] become 
obvious whether or not performance issues require use of DB specific 
extensions.

My $0.02...

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Boulevard
Suite H
Lanham, MD  20706
301-459-3366 x.5010
fax 301-429-5748

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: (SoC) SQL storage option for libsvn_fs_base

Posted by Charles Bailey <ba...@gmail.com>.
On 14 Jun 2005 11:09:08 -0500, C. Michael Pilato <cm...@collab.net> wrote:
>
> I've been recently warned against referring to a generic SQL backend,
> as the flavors of SQL-ready databases differ wildly enough that you
> really need to make special considerations for each one in order to
> harvest the real utility of any of them.  So, unless we really think
> we can pull off a generic SQL backend that just works with all SQL
> databases, let's call this one what it is intended to be -- a
> postqresql backend -- shall we?  Just trying to eliminate confusion
> and properly set expectations of the receiving public.

Granting that the details of connecting to the DB and SQL extensions
differ, it'd be nice if a prototypical SQL backend tried to stay as
general as possible (e.g. sequences are available in most
commonly-used RDBs, but table inheritance isn't), to make it more
possible to port the result from Postgres to, say, MySQL.

Just my $0.02.


-- 
Regards,
Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: (SoC) SQL storage option for libsvn_fs_base

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
>>    
>>
>>>...
>>>      
>>>
>>So, in reviewing your new plan, it strikes me as the same plan that
>>Glenn A. Thompson (who likely cometh hence from the ether, but for a
>>moment, as I didst utter his name) has been talking about
>>on-again-off-again for ... gosh, years, now.  Don't get me wrong --
>>that's a good thing!
>>    
>>
>
>Yup!
>
What's the good part?  The fact that it appears to be the same plan, or 
the fact that I've been talking about it for years now:-)

In case you are wondering what low level code I *think* I have. This 
link talks a little about it.
Please keep in mind that it is more than two years old.  
http://www.cdrguys.com/subversion/sql_fs_docs/svn_fs_sql.htm  (this box 
is in NY and will stay up while I move)
Please don't get hung up in the upper layer discussion.  It's mostly 
covered by the pluggable work already in the repo. 

My code had a couple pretty slick features as I recall.  One was the 
ability to script SQL statements from Subversion config files.  It even 
allowed you to specify allowed errors for things like duplicate rows 
etc.  All the code used APR libs for memory management.

Colorado here I come!

gat



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: (SoC) SQL storage option for libsvn_fs_base

Posted by Edmund Horner <ej...@paradise.net.nz>.
C. Michael Pilato wrote:
> Edmund Horner <ej...@paradise.net.nz> writes:
> 
>>...
> 
> So, in reviewing your new plan, it strikes me as the same plan that
> Glenn A. Thompson (who likely cometh hence from the ether, but for a
> moment, as I didst utter his name) has been talking about
> on-again-off-again for ... gosh, years, now.  Don't get me wrong --
> that's a good thing!

Yup!

>>...
> 
> Blah blah yadda yadda ... :-)

What a very astute remark!  :-)

>>Deliverables:
>>
>>  * Storage layer abstraction in libsvn_fs_base, with the BDB storage option
>>    continuing to work as before.
>>
>>  * Addition of an SQL storage option, initially storing data in a PostgreSQL
>>    database.
>>
>>  * SQL functions for exploring the repository in an SQL client.
> 
> 
> Okey dokey.  You forgot one, though:
> 
>     * Doing all this without noticably negative performance impact on
>       existing backends.
>
> [Rest of the proposal trimmed]
> 
> I've been recently warned against referring to a generic SQL backend,
> as the flavors of SQL-ready databases differ wildly enough that you
> really need to make special considerations for each one in order to
> harvest the real utility of any of them.  So, unless we really think
> we can pull off a generic SQL backend that just works with all SQL
> databases, let's call this one what it is intended to be -- a
> postqresql backend -- shall we?  Just trying to eliminate confusion
> and properly set expectations of the receiving public.

Yes, I see.  I certainly do plan to avoid any of the PostgreSQL
extensions, or in fact any "standard" SQL features that aren't widely
available, in the libsvn_fs_base work.  This probably means going
without user-defined functions, and maybe without views and subselects.
 I think that's easy enough.  My main concern for portability is data
types (e.g. BYTEA vs. BLOB vs. etc.).

But since at some point I'll need to do the very database-specific
"#include <postgres_fe.h>"; well, perhaps there will be
database-specific functions for things like creating the repository.

If, in light of the above, you think "PostgreSQL backend" is more
accurate, I'm happy to consider that.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: (SoC) SQL storage option for libsvn_fs_base

Posted by "C. Michael Pilato" <cm...@collab.net>.
Edmund Horner <ej...@paradise.net.nz> writes:

> You might recall my on-again-off-again interest in storing Subversion
> data in SQL; well, I've begun to think about it again and come up with a
> new plan for doing it.

So, in reviewing your new plan, it strikes me as the same plan that
Glenn A. Thompson (who likely cometh hence from the ether, but for a
moment, as I didst utter his name) has been talking about
on-again-off-again for ... gosh, years, now.  Don't get me wrong --
that's a good thing!

> SQL STORAGE FOR THE LIBSVN_FS_BASE FILESYSTEM BACKEND
> 
> The storage layer of libsvn_fs_base will be abstracted, and an
> implementation of an SQL storage option will be added alongside the existing
> BDB storage option.
> 
> Benefits:
> 
> The benefits of this contribution will be a complete first SQL filesystem
> backend, and the increased querying options that this will bring to
> Subversion administrators.  For instance, queries such as "which file
> revisions have property foo?", or "which files are direct or indirect copies
> of /trunk/README?" will be efficient and available as SQL queries.  Another
> possible benefit of the SQL filesystem is repository replication.

Blah blah yadda yadda ... :-)

> Deliverables:
> 
>   * Storage layer abstraction in libsvn_fs_base, with the BDB storage option
>     continuing to work as before.
> 
>   * Addition of an SQL storage option, initially storing data in a PostgreSQL
>     database.
> 
>   * SQL functions for exploring the repository in an SQL client.

Okey dokey.  You forgot one, though:

    * Doing all this without noticably negative performance impact on
      existing backends.

[Rest of the proposal trimmed]

I've been recently warned against referring to a generic SQL backend,
as the flavors of SQL-ready databases differ wildly enough that you
really need to make special considerations for each one in order to
harvest the real utility of any of them.  So, unless we really think
we can pull off a generic SQL backend that just works with all SQL
databases, let's call this one what it is intended to be -- a
postqresql backend -- shall we?  Just trying to eliminate confusion
and properly set expectations of the receiving public.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: (SoC) SQL storage option for libsvn_fs_base

Posted by Edmund Horner <ej...@paradise.net.nz>.
John Fallows wrote:
> Edmund,
> 
> When you start looking into the schema design in more detail, the
> following article discussing how to store Trees in SQL might be
> helpful.

Thanks!  That's a very useful article for SQL in general.

I don't think I'll be using it for this project though, since I'm
basically reusing the schema they use for the BDB backend.  (On the
other hand, that method of using the transitive closure could make some
queries more efficient.)

Edmund.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: (SoC) SQL storage option for libsvn_fs_base

Posted by John Fallows <jo...@gmail.com>.
Edmund,

When you start looking into the schema design in more detail, the
following article discussing how to store Trees in SQL might be
helpful.

----
Trees in SQL: Nested Sets and Materialized Path
by Vadim Tropashko

Relational databases are universally conceived of as an advance over
their predecessors network and hierarchical models. Superior in every
querying respect, they turned out to be surprisingly incomplete when
modeling transitive dependencies. Almost every couple of months a
question about how to model a tree in the database pops up at the
comp.database.theory newsgroup. In this article I'll investigate two
out of four well known approaches to accomplishing this and show a
connection between them. We'll discover a new method that could be
considered as a "mix-in" between materialized path and nested sets.
----

http://www.dbazine.com/oracle/or-articles/tropashko4

Kind Regards,
John Fallows.

On 6/9/05, Edmund Horner <ej...@paradise.net.nz> wrote:
> Hello everyone.
> 
> You might recall my on-again-off-again interest in storing Subversion
> data in SQL; well, I've begun to think about it again and come up with a
> new plan for doing it.
> 
> I'm going to submit this application to Google's Summer of Code, but I
> thought I'd air it here first in case there are obvious things I'm
> forgetting about.
> 
> Please comment by 14 June. :-)
> 
> 
> 
> 
> Google "Summer of Code" proposal for the Subversion project
> 
> Edmund Horner
> ejrh@paradise.net.nz
> 
> 
> SQL STORAGE FOR THE LIBSVN_FS_BASE FILESYSTEM BACKEND
> 
> The storage layer of libsvn_fs_base will be abstracted, and an
> implementation of an SQL storage option will be added alongside the existing
> BDB storage option.
> 
> Benefits:
> 
> The benefits of this contribution will be a complete first SQL filesystem
> backend, and the increased querying options that this will bring to
> Subversion administrators.  For instance, queries such as "which file
> revisions have property foo?", or "which files are direct or indirect copies
> of /trunk/README?" will be efficient and available as SQL queries.  Another
> possible benefit of the SQL filesystem is repository replication.
> 
> Deliverables:
> 
>   * Storage layer abstraction in libsvn_fs_base, with the BDB storage option
>     continuing to work as before.
> 
>   * Addition of an SQL storage option, initially storing data in a PostgreSQL
>     database.
> 
>   * SQL functions for exploring the repository in an SQL client.
> 
> Details:
> 
> The BDB-specific functions (mostly in the bdb subdirectory) will be further
> separated from libsvn_fs_base.  A storage vtable will be created, which will
> be used anywhere that the code currently calls a BDB function directly.
> This vtable structure will be private to libsvn_fs_base.
> 
> An SQL implementation of the vtable will be created, with a very similar
> structure to the BDB implementation.  Schema changes will include using
> integers for unique identifiers, replacing skels with individual SQL
> attributes, and storing properties and directory entries as individual rows.
> 
> SQL functions for queries like those mentioned above will be written and
> automatically created as part of repository creation.  These functions will
> be designed for human exploration of the repository filesystem.
> 
> Changes will occur in libsvn_fs_base, with the exception of additional
> command-line arguments to svnadmin for repository creation (and the passing
> of those arguments through to libsvn_fs_base).
> 
> Schedule:
> 
> Work will began full-time after my last mid-year examination on 18 June.
> I have already spent some time recently improving my familiarity with the
> existing filesystem backends.  I can work full time on it until 4 July, when
> the second half of the study year begins.  After that I will continue work
> at a slower pace.
> 
> If there are no contradictions in the design, I believe the work can be
> completed within 2 or 3 months part-time.
> 
> Biography:
> 
> I am a student at Victoria University of Wellington, studying Mathematics
> and Computer Science.  I will likely complete a Bachelor of Science degree
> in the first half of 2006.
> 
> I have been a Subversion user since November 2002, and until now have taken
> a mostly-passive interest in its development.  I worked on a short-lived
> MySQL-based filesystem backend in early 2004, and also contributed one small
> patch (issue #1861 "library version checks", r10167) in July 2004.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: (SoC) SQL storage option for libsvn_fs_base

Posted by Chris Foote <cf...@v21.me.uk>.
Have you considered using the apr_dbd api?

Unfortunatly it is in apr-util 1.2.

Chris

----- Original Message ----- 
From: "Edmund Horner" <ej...@paradise.net.nz>
To: <de...@subversion.tigris.org>
Sent: Friday, June 10, 2005 5:05 AM
Subject: (SoC) SQL storage option for libsvn_fs_base


> Hello everyone.
> 
> You might recall my on-again-off-again interest in storing Subversion
> data in SQL; well, I've begun to think about it again and come up with a
> new plan for doing it.
> 
> I'm going to submit this application to Google's Summer of Code, but I
> thought I'd air it here first in case there are obvious things I'm
> forgetting about.
> 
> Please comment by 14 June. :-)
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org