You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Phil Cross <pc...@pixeltecs.com> on 2009/03/25 20:40:00 UTC

Enhancement suggestion...

I'm not sure where to post a suggestion for future versions but this is
definitely not a "bug".  If this isn't the place or if you guys have
already thought of it, please let me know.

I know that svn is typically used for source code control and not binary
files, so this may not be a common usage.  I have a library of
multimedia that I maintain in version-control along side its source
files.  Sometimes I have a situation where I have updated some
multimedia "player" that is used in several locations.  Then, I updated
all the directories where the player resides with the newest player.
Then I commit the revision.  Subversion proceeds to then upload the same
file over and over again to the repository.  It seems that the
client/server communication could be a little smarter and only send a
file once if it is the same file.

As I am writing this, I am committing a media directory that contains 57
"lessons": media/lesson1, media/lesson2, etc...  I have a 5MB file
called lesson.exe in each of these that I had to change.  I have to keep
the file in each directory because my company often distributes a single
"lesson" and not the entire collection.

I am not familiar with the protocol on the back end of subversion but I
assume it goes something like this:

put file1 at /xyz
put file2 at /asdf
put file3 at /xyz/asdf

Let's assume file1 and file3 are actually the same binary data.  Maybe
you could just add some sort of back reference that the server could
understand like this:

put file1 at /xyz label=A
put file2 at /asdf
put [A] at /xyz/asdf

The client would have to be smart enough to determine which files match
and could inform the server in this manner.  Obviously, this is only a
suggested implementation as I am not familiar with the protocol.

Thanks (if nothing else, just for reading this),
Phil

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1419789

Re: Enhancement suggestion...

Posted by Branko Cibej <br...@xbc.nu>.
Greg Stein wrote:
> On Thu, Mar 26, 2009 at 14:59, Mark Phippard <ma...@gmail.com> wrote:
>   
>> On Thu, Mar 26, 2009 at 9:58 AM, Philipp Marek
>> <ph...@emerion.com> wrote:
>>     
>>> On Donnerstag, 26. März 2009, Mark Phippard wrote:
>>>       
>>>> On Thu, Mar 26, 2009 at 9:35 AM, Greg Stein <gs...@gmail.com> wrote:
>>>> With the new rep-sharing feature we in theory do not store the file
>>>> multiple times in the repository.  So what if a client were to tell
>>>> the server what it was going to send, along with the checksums, and
>>>> the server were to reply, OK, send me this one, but not that one.  I
>>>> already have it?
>>>>
>>>> I recall you were thinking of something similar in WC-NG for
>>>> checkout/update etc.  If the server is going to give the file it
>>>> already has in its cache, it could skip downloading it.
>>>>         
>>> I'd like to remind of the "fs-rep-sharing branch" discussion last year ...
>>>   http://svn.haxx.se/dev/archive-2008-10/0853.shtml
>>>
>>> Just using the MD5 might not be enough to define files as "equal"; at least
>>> use SHA256 or something like that.
>>>       
>> What's good enough for the repository storage, ought to be good enough
>> for this.  I thought it was changed to SHA-1 or something anyway?
>>     
>
> The storage keys them via SHA1, I believe.
>
> We certainly keep both in the repository.
>
> Right now, the client really only deals with MD5 checksums for
> historical reasons. When we revamp the editor interface, then we can
> switch to SHA1.
>   

I've always felt that using *both* SHA1 and MD5 would be best ...
they're based on different algorithms, which makes it that much harder
to spoof both.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1431670


Re: Enhancement suggestion...

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Mar 26, 2009 at 14:59, Mark Phippard <ma...@gmail.com> wrote:
> On Thu, Mar 26, 2009 at 9:58 AM, Philipp Marek
> <ph...@emerion.com> wrote:
>> On Donnerstag, 26. März 2009, Mark Phippard wrote:
>>> On Thu, Mar 26, 2009 at 9:35 AM, Greg Stein <gs...@gmail.com> wrote:
>>> With the new rep-sharing feature we in theory do not store the file
>>> multiple times in the repository.  So what if a client were to tell
>>> the server what it was going to send, along with the checksums, and
>>> the server were to reply, OK, send me this one, but not that one.  I
>>> already have it?
>>>
>>> I recall you were thinking of something similar in WC-NG for
>>> checkout/update etc.  If the server is going to give the file it
>>> already has in its cache, it could skip downloading it.
>> I'd like to remind of the "fs-rep-sharing branch" discussion last year ...
>>   http://svn.haxx.se/dev/archive-2008-10/0853.shtml
>>
>> Just using the MD5 might not be enough to define files as "equal"; at least
>> use SHA256 or something like that.
>
> What's good enough for the repository storage, ought to be good enough
> for this.  I thought it was changed to SHA-1 or something anyway?

The storage keys them via SHA1, I believe.

We certainly keep both in the repository.

Right now, the client really only deals with MD5 checksums for
historical reasons. When we revamp the editor interface, then we can
switch to SHA1.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1430444


Re: Enhancement suggestion...

Posted by Mark Phippard <ma...@gmail.com>.
On Thu, Mar 26, 2009 at 9:58 AM, Philipp Marek
<ph...@emerion.com> wrote:
> On Donnerstag, 26. März 2009, Mark Phippard wrote:
>> On Thu, Mar 26, 2009 at 9:35 AM, Greg Stein <gs...@gmail.com> wrote:
>> With the new rep-sharing feature we in theory do not store the file
>> multiple times in the repository.  So what if a client were to tell
>> the server what it was going to send, along with the checksums, and
>> the server were to reply, OK, send me this one, but not that one.  I
>> already have it?
>>
>> I recall you were thinking of something similar in WC-NG for
>> checkout/update etc.  If the server is going to give the file it
>> already has in its cache, it could skip downloading it.
> I'd like to remind of the "fs-rep-sharing branch" discussion last year ...
>   http://svn.haxx.se/dev/archive-2008-10/0853.shtml
>
> Just using the MD5 might not be enough to define files as "equal"; at least
> use SHA256 or something like that.

What's good enough for the repository storage, ought to be good enough
for this.  I thought it was changed to SHA-1 or something anyway?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1430405


Re: Enhancement suggestion...

Posted by Philipp Marek <ph...@emerion.com>.
On Donnerstag, 26. März 2009, Mark Phippard wrote:
> On Thu, Mar 26, 2009 at 9:35 AM, Greg Stein <gs...@gmail.com> wrote:
> With the new rep-sharing feature we in theory do not store the file
> multiple times in the repository.  So what if a client were to tell
> the server what it was going to send, along with the checksums, and
> the server were to reply, OK, send me this one, but not that one.  I
> already have it?
>
> I recall you were thinking of something similar in WC-NG for
> checkout/update etc.  If the server is going to give the file it
> already has in its cache, it could skip downloading it.
I'd like to remind of the "fs-rep-sharing branch" discussion last year ...
   http://svn.haxx.se/dev/archive-2008-10/0853.shtml

Just using the MD5 might not be enough to define files as "equal"; at least 
use SHA256 or something like that.


Regards,

Phil



Re: Enhancement suggestion...

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Mar 26, 2009 at 14:47, Mark Phippard <ma...@gmail.com> wrote:
> On Thu, Mar 26, 2009 at 9:35 AM, Greg Stein <gs...@gmail.com> wrote:
>...
>> I suspect we can have our v2 http protocol do the right operation.
>> Might need some FS changes, though: we'd need to copy a node, but not
>> leave copy information. For http v1, we'd probably just have to send
>> it multiple times like today.
>
> With the new rep-sharing feature we in theory do not store the file
> multiple times in the repository.  So what if a client were to tell
> the server what it was going to send, along with the checksums, and
> the server were to reply, OK, send me this one, but not that one.  I
> already have it?

I thought about the rep-sharing and that we wouldn't store it multiple
times. But I hadn't thought about a pre-flighting test. Good one.

We could pass a list of checksums in the initial POST to the server.
In its reply, it could tell us which checksums it already has (the
list of "have" will probably be shorter than the list of "need").

The client will still have to tell the server more about the node:
add? copy? properties? etc. So maybe we do that with PROPPATCH or a
POST.

> I recall you were thinking of something similar in WC-NG for
> checkout/update etc.  If the server is going to give the file it
> already has in its cache, it could skip downloading it.

Right. The checksums come back to the client as properties in a
PROPFIND. We'd simply skip the corresponding GET operation.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1430423


Re: Enhancement suggestion...

Posted by Mark Phippard <ma...@gmail.com>.
On Thu, Mar 26, 2009 at 9:35 AM, Greg Stein <gs...@gmail.com> wrote:
> On Thu, Mar 26, 2009 at 14:22, Mark Phippard <ma...@gmail.com> wrote:
>> On Thu, Mar 26, 2009 at 9:12 AM, Greg Stein <gs...@gmail.com> wrote:
>>
>>> Another approach in 1.6 that could work is to take advantage of
>>> knowing the protocol steps used. He could add and commit the
>>> lesson.exe to one directory. Then use "svn copy" to make copies of it
>>> to the other directories. At commit time, svn *should* tell the server
>>> to simply copy the file to the other directories. It shouldn't upload
>>> it again.
>>
>> This is the approach that I always use.  It works well for me.
>
> Yeah... but "user strategies" are always annoying, in favor of the
> tool just Doing The Right Thing.
>
> I suspect we can have our v2 http protocol do the right operation.
> Might need some FS changes, though: we'd need to copy a node, but not
> leave copy information. For http v1, we'd probably just have to send
> it multiple times like today.

With the new rep-sharing feature we in theory do not store the file
multiple times in the repository.  So what if a client were to tell
the server what it was going to send, along with the checksums, and
the server were to reply, OK, send me this one, but not that one.  I
already have it?

I recall you were thinking of something similar in WC-NG for
checkout/update etc.  If the server is going to give the file it
already has in its cache, it could skip downloading it.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1430268


Re: Enhancement suggestion...

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Mar 26, 2009 at 14:22, Mark Phippard <ma...@gmail.com> wrote:
> On Thu, Mar 26, 2009 at 9:12 AM, Greg Stein <gs...@gmail.com> wrote:
>
>> Another approach in 1.6 that could work is to take advantage of
>> knowing the protocol steps used. He could add and commit the
>> lesson.exe to one directory. Then use "svn copy" to make copies of it
>> to the other directories. At commit time, svn *should* tell the server
>> to simply copy the file to the other directories. It shouldn't upload
>> it again.
>
> This is the approach that I always use.  It works well for me.

Yeah... but "user strategies" are always annoying, in favor of the
tool just Doing The Right Thing.

I suspect we can have our v2 http protocol do the right operation.
Might need some FS changes, though: we'd need to copy a node, but not
leave copy information. For http v1, we'd probably just have to send
it multiple times like today.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1430046


Re: Enhancement suggestion...

Posted by Mark Phippard <ma...@gmail.com>.
On Thu, Mar 26, 2009 at 9:12 AM, Greg Stein <gs...@gmail.com> wrote:

> Another approach in 1.6 that could work is to take advantage of
> knowing the protocol steps used. He could add and commit the
> lesson.exe to one directory. Then use "svn copy" to make copies of it
> to the other directories. At commit time, svn *should* tell the server
> to simply copy the file to the other directories. It shouldn't upload
> it again.

This is the approach that I always use.  It works well for me.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1429935

Re: Enhancement suggestion...

Posted by Greg Stein <gs...@gmail.com>.
On Wed, Mar 25, 2009 at 22:07, Hyrum K. Wright
<hy...@mail.utexas.edu> wrote:
> On Mar 25, 2009, at 4:01 PM, Mark Phippard wrote:
>
>> Given that he said lesson.exe I assume this is Windows and we do not
>> support symlinks.
>>
>> With 1.6 we so support svn:externals for files though.  So it could be
>> added just in one place and referenced in others.
>>
>> With all the stuff we are doing around hashes and rep-sharing etc.  It
>> is an interesting idea.  What if the client told the server the
>> checksum of what it was sending and the server just said "don't bother
>> I already have one of those"?
>
> That thought had occurred to me, also.  An intriguing idea, to be sure.

The file external could work pretty well, and is available in 1.6.

In 1.7, we'd definitely know it is the same file because we'll be
storing the pristine version on the client, indexed by its checksum.
We might be able to take advantage of that during the commit process.

Another approach in 1.6 that could work is to take advantage of
knowing the protocol steps used. He could add and commit the
lesson.exe to one directory. Then use "svn copy" to make copies of it
to the other directories. At commit time, svn *should* tell the server
to simply copy the file to the other directories. It shouldn't upload
it again.

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1429825


Re: Enhancement suggestion...

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Mar 25, 2009, at 4:01 PM, Mark Phippard wrote:

> Given that he said lesson.exe I assume this is Windows and we do not
> support symlinks.
>
> With 1.6 we so support svn:externals for files though.  So it could be
> added just in one place and referenced in others.
>
> With all the stuff we are doing around hashes and rep-sharing etc.  It
> is an interesting idea.  What if the client told the server the
> checksum of what it was sending and the server just said "don't bother
> I already have one of those"?

That thought had occurred to me, also.  An intriguing idea, to be sure.

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1420098

Re: Enhancement suggestion...

Posted by Mark Phippard <ma...@gmail.com>.
Given that he said lesson.exe I assume this is Windows and we do not
support symlinks.

With 1.6 we so support svn:externals for files though.  So it could be
added just in one place and referenced in others.

With all the stuff we are doing around hashes and rep-sharing etc.  It
is an interesting idea.  What if the client told the server the
checksum of what it was sending and the server just said "don't bother
I already have one of those"?

Mark

On Wed, Mar 25, 2009 at 4:56 PM, Ben Collins-Sussman
<su...@red-bean.com> wrote:
> Side note:  have you considered using symlinks?  Make file 1 "real",
> but then make all the other copies of it just unix symlinks to the
> original?  (This would only work if you're sure that people never
> checkout tiny subtrees of the repository.)
>
> On Wed, Mar 25, 2009 at 3:40 PM, Phil Cross <pc...@pixeltecs.com> wrote:
>> I'm not sure where to post a suggestion for future versions but this is
>> definitely not a "bug".  If this isn't the place or if you guys have already
>> thought of it, please let me know.
>>
>> I know that svn is typically used for source code control and not binary
>> files, so this may not be a common usage.  I have a library of multimedia
>> that I maintain in version-control along side its source files.  Sometimes I
>> have a situation where I have updated some multimedia "player" that is used
>> in several locations.  Then, I updated all the directories where the player
>> resides with the newest player.  Then I commit the revision.  Subversion
>> proceeds to then upload the same file over and over again to the
>> repository.  It seems that the client/server communication could be a little
>> smarter and only send a file once if it is the same file.
>>
>> As I am writing this, I am committing a media directory that contains 57
>> "lessons": media/lesson1, media/lesson2, etc...  I have a 5MB file called
>> lesson.exe in each of these that I had to change.  I have to keep the file
>> in each directory because my company often distributes a single "lesson" and
>> not the entire collection.
>>
>> I am not familiar with the protocol on the back end of subversion but I
>> assume it goes something like this:
>>
>> put file1 at /xyz
>> put file2 at /asdf
>> put file3 at /xyz/asdf
>>
>> Let's assume file1 and file3 are actually the same binary data.  Maybe you
>> could just add some sort of back reference that the server could understand
>> like this:
>>
>> put file1 at /xyz label=A
>> put file2 at /asdf
>> put [A] at /xyz/asdf
>>
>> The client would have to be smart enough to determine which files match and
>> could inform the server in this manner.  Obviously, this is only a suggested
>> implementation as I am not familiar with the protocol.
>>
>> Thanks (if nothing else, just for reading this),
>> Phil
>>
>>
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1420005
>



-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1420062


Re: Enhancement suggestion...

Posted by Ben Collins-Sussman <su...@red-bean.com>.
Side note:  have you considered using symlinks?  Make file 1 "real",
but then make all the other copies of it just unix symlinks to the
original?  (This would only work if you're sure that people never
checkout tiny subtrees of the repository.)

On Wed, Mar 25, 2009 at 3:40 PM, Phil Cross <pc...@pixeltecs.com> wrote:
> I'm not sure where to post a suggestion for future versions but this is
> definitely not a "bug".  If this isn't the place or if you guys have already
> thought of it, please let me know.
>
> I know that svn is typically used for source code control and not binary
> files, so this may not be a common usage.  I have a library of multimedia
> that I maintain in version-control along side its source files.  Sometimes I
> have a situation where I have updated some multimedia "player" that is used
> in several locations.  Then, I updated all the directories where the player
> resides with the newest player.  Then I commit the revision.  Subversion
> proceeds to then upload the same file over and over again to the
> repository.  It seems that the client/server communication could be a little
> smarter and only send a file once if it is the same file.
>
> As I am writing this, I am committing a media directory that contains 57
> "lessons": media/lesson1, media/lesson2, etc...  I have a 5MB file called
> lesson.exe in each of these that I had to change.  I have to keep the file
> in each directory because my company often distributes a single "lesson" and
> not the entire collection.
>
> I am not familiar with the protocol on the back end of subversion but I
> assume it goes something like this:
>
> put file1 at /xyz
> put file2 at /asdf
> put file3 at /xyz/asdf
>
> Let's assume file1 and file3 are actually the same binary data.  Maybe you
> could just add some sort of back reference that the server could understand
> like this:
>
> put file1 at /xyz label=A
> put file2 at /asdf
> put [A] at /xyz/asdf
>
> The client would have to be smart enough to determine which files match and
> could inform the server in this manner.  Obviously, this is only a suggested
> implementation as I am not familiar with the protocol.
>
> Thanks (if nothing else, just for reading this),
> Phil
>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1420005