You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@alice-dsl.de> on 2010/05/02 15:34:46 UTC

[Sketch] Don't checksum reliable transmissions

Hi there,

this is not a release-grade patch set but it applies to
/trunk and demonstrates the basic concept.

The idea is the following: ra_local (and possibly others)
are "reliable" in that they won't corrupt transmission.
For now, this implies that we don't need to checksum
the data again. There might be more shortcuts that
can be taken in the future.

ra_local is a relevant use-case for server-side applications
like scripts, indexing applications etc. But it would be
feasible to optionally enable that for LAN use as well.

ReliableRA.patch extends svn_repos_begin_report
with a ra_is_reliable flag and ra_local plugin is the only
one to set it to TRUE. The second patch extends the
apply_textdelta member in svn_delta_editor_t with
an calculate_checksum flag. The only place where it
is actually used is in export.c and reporter.c.

With all other patches applied, this saves 15% runtime
in 'svn export file://'. However, rev'ing svn_delta_editor_t
seems only be feasible in context of 'editor v2'.

-- Stefan^2.

Re: [Sketch] Don't checksum reliable transmissions

Posted by Branko Čibej <br...@xbc.nu>.
On 03.05.2010 23:05, Stefan Fuhrmann wrote:
> Blair Zajac wrote:
>> On 5/2/10 8:56 AM, Philipp Marek wrote:
>>> Hello Stefan!
>>>
>>>> The idea is the following: ra_local (and possibly others)
>>>> are "reliable" in that they won't corrupt transmission.
>>>> For now, this implies that we don't need to checksum
>>>> the data again. There might be more shortcuts that
>>>> can be taken in the future.
>>> ...
>>>> With all other patches applied, this saves 15% runtime
>>>> in 'svn export file://'. However, rev'ing svn_delta_editor_t
>>>> seems only be feasible in context of 'editor v2'.
>>> I understand your point, and 15% sound very interesting.
>>>
>>> But I'd like to note that even for local operations it's a nice
>>> feature to avoid data
>>> corruption via software bugs (off-by-one and similar), and over LAN
>>> it's nice to have,
>>> too (eg. because of bugs in switches and routers, see eg.
>>> http://www.cs.pitt.edu/~kyoungsoo/cs2520/papers/CRC_TCP_Checksum.pdf).
>>>
>>> As the WAN (Internet) uses the same protocols as a LAN (TCP, IP)
>>> following that argument
>>> would mean that the checksum is informational only, and doesn't need
>>> to be checked.
>>>
>>>
>>> So I'd propose that this would be configurable for the paranoid
>>> people (like me, maybe ;-).
>>
>> Given the size of the patch, reving of the API and that I like the
>> checksum to always be done, I would just have svn always perform the
>> checksums.
>>
> But you know that today ra_local does NOT checksum
> for certain operations, at least for single file gets / exports.
> However, I can see the point in making it a client-side option -
> maybe on a per-server basis just like everything else in
> the 'servers' config file.

Doing a checksum on the client-side is sane and expected when you
receive data from the server. It opens a security hole if you stop the
server from checksumming the data it receives in turn.

Re: [Sketch] Don't checksum reliable transmissions

Posted by Stefan Fuhrmann <st...@alice-dsl.de>.
Blair Zajac wrote:
> On 5/2/10 8:56 AM, Philipp Marek wrote:
>> Hello Stefan!
>>
>>> The idea is the following: ra_local (and possibly others)
>>> are "reliable" in that they won't corrupt transmission.
>>> For now, this implies that we don't need to checksum
>>> the data again. There might be more shortcuts that
>>> can be taken in the future.
>> ...
>>> With all other patches applied, this saves 15% runtime
>>> in 'svn export file://'. However, rev'ing svn_delta_editor_t
>>> seems only be feasible in context of 'editor v2'.
>> I understand your point, and 15% sound very interesting.
>>
>> But I'd like to note that even for local operations it's a nice 
>> feature to avoid data
>> corruption via software bugs (off-by-one and similar), and over LAN 
>> it's nice to have,
>> too (eg. because of bugs in switches and routers, see eg.
>> http://www.cs.pitt.edu/~kyoungsoo/cs2520/papers/CRC_TCP_Checksum.pdf).
>>
>> As the WAN (Internet) uses the same protocols as a LAN (TCP, IP) 
>> following that argument
>> would mean that the checksum is informational only, and doesn't need 
>> to be checked.
>>
>>
>> So I'd propose that this would be configurable for the paranoid 
>> people (like me, maybe ;-).
>
> Given the size of the patch, reving of the API and that I like the 
> checksum to always be done, I would just have svn always perform the 
> checksums.
>
But you know that today ra_local does NOT checksum
for certain operations, at least for single file gets / exports.
However, I can see the point in making it a client-side option -
maybe on a per-server basis just like everything else in
the 'servers' config file.

-- Stefan^2.


Re: [Sketch] Don't checksum reliable transmissions

Posted by Blair Zajac <bl...@orcaware.com>.
On 5/2/10 8:56 AM, Philipp Marek wrote:
> Hello Stefan!
>
>> The idea is the following: ra_local (and possibly others)
>> are "reliable" in that they won't corrupt transmission.
>> For now, this implies that we don't need to checksum
>> the data again. There might be more shortcuts that
>> can be taken in the future.
> ...
>> With all other patches applied, this saves 15% runtime
>> in 'svn export file://'. However, rev'ing svn_delta_editor_t
>> seems only be feasible in context of 'editor v2'.
> I understand your point, and 15% sound very interesting.
>
> But I'd like to note that even for local operations it's a nice feature to avoid data
> corruption via software bugs (off-by-one and similar), and over LAN it's nice to have,
> too (eg. because of bugs in switches and routers, see eg.
> http://www.cs.pitt.edu/~kyoungsoo/cs2520/papers/CRC_TCP_Checksum.pdf).
>
> As the WAN (Internet) uses the same protocols as a LAN (TCP, IP) following that argument
> would mean that the checksum is informational only, and doesn't need to be checked.
>
>
> So I'd propose that this would be configurable for the paranoid people (like me, maybe ;-).

Given the size of the patch, reving of the API and that I like the checksum to 
always be done, I would just have svn always perform the checksums.

Blair

Re: [Sketch] Don't checksum reliable transmissions

Posted by Philipp Marek <ph...@marek.priv.at>.
Hello Stefan!

> The idea is the following: ra_local (and possibly others)
> are "reliable" in that they won't corrupt transmission.
> For now, this implies that we don't need to checksum
> the data again. There might be more shortcuts that
> can be taken in the future.
...
> With all other patches applied, this saves 15% runtime
> in 'svn export file://'. However, rev'ing svn_delta_editor_t
> seems only be feasible in context of 'editor v2'.
I understand your point, and 15% sound very interesting.

But I'd like to note that even for local operations it's a nice feature to avoid data
corruption via software bugs (off-by-one and similar), and over LAN it's nice to have,
too (eg. because of bugs in switches and routers, see eg.
http://www.cs.pitt.edu/~kyoungsoo/cs2520/papers/CRC_TCP_Checksum.pdf).

As the WAN (Internet) uses the same protocols as a LAN (TCP, IP) following that argument
would mean that the checksum is informational only, and doesn't need to be checked.


So I'd propose that this would be configurable for the paranoid people (like me, maybe ;-).


Regards,

Phil


-- 
Versioning your /etc, /home or even your whole installation?
             Try fsvs (fsvs.tigris.org)!