You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Reser <be...@reser.org> on 2014/09/02 19:28:07 UTC
Re: [PATCH] Introduce per-instance filesystem UUIDs
On 8/20/14 9:13 AM, Ben Reser wrote:
> I think part of the problem here has been we (as in WANdisco folks) have
> discussed the idea of an instance ID for repositories in the past to solve the
> range of replacing the repository without clearing the cache issues. But this
> change is being added for a very different reason.
>
> Evgeny has implemented the instance ID for the purpose of solving the problem
> of two different repositories not being able to be locked if they happen to
> have the same UUID. This happens because we use a mutex to handle locking
> between threads and that mutex can't distinguish between different repositories
> with identical UUIDs.
>
> Currently the code on trunk adds the instance ID to the cache keys. I'm not
> sure we should be doing that (though both brane and stefan2 requested that be
> done). As per the discussion today at the SHF hackathon the instance ID can't
> resolve the failure to clear the cache issues. The best it can do is narrow
> the window for these issues to exist. That would seem like a good thing but I
> think it creates a huge false sense of security. We will ultimately have
> someone that comes along with a corrupted repository, we're going to say you
> replaced the repo while the server was running and the user is going to say
> "But I've been doing this for years without any problem."
>
> Without the instance ID in the cache keys users are unlikely to actually
> corrupt their repository (just like they would be with them, it's a pretty hard
> race to hit). But they are likely to get errors related to the cache being
> stale. This gives them a giant hint that what they're doing is wrong and gives
> us an opportunity to educate them.
Evgeny do you have any thoughts on this? I think it's probably best to remove
the instance id from the cache keys. I believe brane and stefan2 were
convinced of this at the hackathon.
Re: [PATCH] Introduce per-instance filesystem UUIDs
Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Evgeny Kotkov <ev...@visualsvn.com> writes:
> Ben Reser <be...@reser.org> writes:
>
>>> I think part of the problem here has been we (as in WANdisco folks) have
>>> discussed the idea of an instance ID for repositories in the past to solve
>>> the range of replacing the repository without clearing the cache issues.
>>> But this change is being added for a very different reason.
>>>
>>> Evgeny has implemented the instance ID for the purpose of solving the
>>> problem of two different repositories not being able to be locked if they
>>> happen to have the same UUID. This happens because we use a mutex to handle
>>> locking between threads and that mutex can't distinguish between different
>>> repositories with identical UUIDs.
>>>
>>> Currently the code on trunk adds the instance ID to the cache keys. I'm not
>>> sure we should be doing that (though both brane and stefan2 requested that
>>> be done). As per the discussion today at the SHF hackathon the instance ID
>>> can't resolve the failure to clear the cache issues. The best it can do is
>>> narrow the window for these issues to exist. That would seem like a good
>>> thing but I think it creates a huge false sense of security. We will
>>> ultimately have someone that comes along with a corrupted repository, we're
>>> going to say you replaced the repo while the server was running and the
>>> user is going to say "But I've been doing this for years without any
>>> problem."
>>>
>>> Without the instance ID in the cache keys users are unlikely to actually
>>> corrupt their repository (just like they would be with them, it's a pretty
>>> hard race to hit). But they are likely to get errors related to the cache
>>> being stale. This gives them a giant hint that what they're doing is wrong
>>> and gives us an opportunity to educate them.
>>
>> Evgeny do you have any thoughts on this? I think it's probably best to
>> remove the instance id from the cache keys. I believe brane and stefan2
>> were convinced of this at the hackathon.
>
> Sounds good to me.
>
> However, prior to (re-)narrowing the scope of this change, I would love to
> spend a bit of time and actually reproduce the repository corruption race
> (with or without instance IDs) — just in order to obtain a bit of information,
> which I currently lack. I plan to undo the corresponding part of the
> changeset right after that.
Done in r1623402.
Regards,
Evgeny Kotkov
Re: [PATCH] Introduce per-instance filesystem UUIDs
Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Ben Reser <be...@reser.org> writes:
>> I think part of the problem here has been we (as in WANdisco folks) have
>> discussed the idea of an instance ID for repositories in the past to solve
>> the range of replacing the repository without clearing the cache issues.
>> But this change is being added for a very different reason.
>>
>> Evgeny has implemented the instance ID for the purpose of solving the problem
>> of two different repositories not being able to be locked if they happen to
>> have the same UUID. This happens because we use a mutex to handle locking
>> between threads and that mutex can't distinguish between different
>> repositories with identical UUIDs.
>>
>> Currently the code on trunk adds the instance ID to the cache keys. I'm not
>> sure we should be doing that (though both brane and stefan2 requested that be
>> done). As per the discussion today at the SHF hackathon the instance ID
>> can't resolve the failure to clear the cache issues. The best it can do is
>> narrow the window for these issues to exist. That would seem like a good
>> thing but I think it creates a huge false sense of security. We will
>> ultimately have someone that comes along with a corrupted repository, we're
>> going to say you replaced the repo while the server was running and the user
>> is going to say "But I've been doing this for years without any problem."
>>
>> Without the instance ID in the cache keys users are unlikely to actually
>> corrupt their repository (just like they would be with them, it's a pretty
>> hard race to hit). But they are likely to get errors related to the cache
>> being stale. This gives them a giant hint that what they're doing is wrong
>> and gives us an opportunity to educate them.
>
> Evgeny do you have any thoughts on this? I think it's probably best to remove
> the instance id from the cache keys. I believe brane and stefan2 were
> convinced of this at the hackathon.
Sounds good to me.
However, prior to (re-)narrowing the scope of this change, I would love to
spend a bit of time and actually reproduce the repository corruption race (with
or without instance IDs) — just in order to obtain a bit of information, which
I currently lack. I plan to undo the corresponding part of the changeset right
after that.
Regards,
Evgeny Kotkov