You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@apache.org> on 2017/02/14 15:53:16 UTC
FSFS instance-id and on-disk representation
TL;DR: the repository "instance-id" introduced in FSFS f7 doesn't make
any difference to on-disk representation of FSFS; can we please affirm
that this will continue to be so.
== What is this instance-id? ==
(A brief summary for those who, like me, didn't know what this is.)
In Subversion 1.9 we introduced a repository instance-id in FSFS f7. It
is stored as a second line in the "db/uuid" file. The log message tries
to explain why:
https://svn.apache.org/r1618138
Basically it is to disambiguate some potentially shared data in two
svn_fs_t objects opened to repositories that have the same (primary)
repository UUID. I am still not clear exactly what shared data it is
used for and among which processes that data can be shared.
The log message also mentions some scenarios where having different
instance-ids is important (if they have the same primary UUID). Three of
these that I would like to mention here are:
* during "svnadmin hotcopy repo1 repo2"
* during "svnadmin freeze repo1 (svnadmin freeze repo2 (...))"
* serving repo1 and repo2 from the same Apache httpd instance
(in some configurations)
The second email thread linked from that log message contains most of
the interesting discussion:
http://svn.haxx.se/dev/archive-2014-08/0093.shtml
== Why it matters ==
WD's Svn Multisite Plus (MSP) replicates and synchronizes Subversion
repositories, using rsync initially, then through their own
synchronization software. Until now those replicas are bit-for-bit
identical, and consistency checking has included checking that
repositories remain bit-for-bit identical.
I'm aware that we don't guarantee a repository will be bitwise
predictable (and thus two instances remain bit-for-bit identical) when
written to. But it has been, under these conditions, and this has been
useful.
Replicas are generally kept on physically separate servers, and served
by separate Apache httpd instances. Two replicas are never accessed
together by the same process, in normal use. It is unlikely but
conceivable that an administrator might encounter one of the scenarios
where the instance-id matters.
WANdisco asked me to advise on what to do. It seems the correct thing to
do with "instance-id" is to make that field deliberately different on
each instance of the repository. In consequence the consistency check
will have to be made to ignore differences in the instance-id line in
the "db/uuid" file.
== Question ==
WANdisco would like to know that there will not be differences in the
repository on-disk data due to differences in instance-id (other than
the "db/uuid" itself, of course). I suggest we are talking about the
lifetime of FSFS format 7; of course the features of a future format are
unknown.
Can I tell them that that is the expectation, and we won't change that
situation without a good reason?
- Julian
Re: FSFS instance-id and on-disk representation
Posted by Doug Robinson <do...@wandisco.com>.
Evgeny:
Thank you.
Doug
On Thu, Feb 16, 2017 at 4:40 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:
> Julian Foad <ju...@apache.org> writes:
>
> > == Question ==
> >
> > WANdisco would like to know that there will not be differences in the
> > repository on-disk data due to differences in instance-id (other than the
> > "db/uuid" itself, of course). I suggest we are talking about the
> lifetime of
> > FSFS format 7; of course the features of a future format are unknown.
> >
> > Can I tell them that that is the expectation, and we won't change that
> > situation without a good reason?
>
> The instance ID was added to handle a case when two repositories with
> the same UUID (say, one was hotcopied or dump/loaded from another) are
> opened within a single process.
>
> Without the instance ID, these repositories share internal data that should
> not be shared, such as the transaction list and mutexes. This can result
> in various types of errors or deadlocks. An instance ID makes it possible
> to distinguish the internal data for such near-duplicate repositories, and
> is not used anywhere else.
>
> Answering the question, it is safe to assume that an instance ID doesn't
> change what gets written to the disk (apart from the 'db/uuid' contents),
> and that this will not change in format 7.
>
> Hope this helps :)
>
>
> Regards,
> Evgeny Kotkov
>
--
*DOUGLAS B. ROBINSON* SENIOR PRODUCT MANAGER
*T *925-396-1125
*E* doug.robinson@wandisco.com
*www.wandisco.com <http://www.wandisco.com/>*
--
Learn how WANdisco Fusion solves Hadoop data protection and scalability
challenges <http://www.wandisco.com/hadoop/wd-fusion>
Listed on the London Stock Exchange: WAND
<http://www.bloomberg.com/quote/WAND:LN>
THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE
PRIVILEGED. If this message was misdirected, WANdisco, Inc. and its
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege.
If you are not the intended recipient, please notify us immediately and
destroy the message without disclosing its contents to anyone. Any
distribution, use or copying of this e-mail or the information it contains
by other than an intended recipient is unauthorized. The views and
opinions expressed in this e-mail message are the author's own and may not
reflect the views and opinions of WANdisco, unless the author is authorized
by WANdisco to express such views or opinions on its behalf. All email
sent to or from this address is subject to electronic storage and review by
WANdisco. Although WANdisco operates anti-virus programs, it does not
accept responsibility for any damage whatsoever caused by viruses being
passed.
Re: FSFS instance-id and on-disk representation
Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Julian Foad <ju...@apache.org> writes:
> == Question ==
>
> WANdisco would like to know that there will not be differences in the
> repository on-disk data due to differences in instance-id (other than the
> "db/uuid" itself, of course). I suggest we are talking about the lifetime of
> FSFS format 7; of course the features of a future format are unknown.
>
> Can I tell them that that is the expectation, and we won't change that
> situation without a good reason?
The instance ID was added to handle a case when two repositories with
the same UUID (say, one was hotcopied or dump/loaded from another) are
opened within a single process.
Without the instance ID, these repositories share internal data that should
not be shared, such as the transaction list and mutexes. This can result
in various types of errors or deadlocks. An instance ID makes it possible
to distinguish the internal data for such near-duplicate repositories, and
is not used anywhere else.
Answering the question, it is safe to assume that an instance ID doesn't
change what gets written to the disk (apart from the 'db/uuid' contents),
and that this will not change in format 7.
Hope this helps :)
Regards,
Evgeny Kotkov