You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@apache.org> on 2017/02/14 15:53:16 UTC

FSFS instance-id and on-disk representation

TL;DR: the repository "instance-id" introduced in FSFS f7 doesn't make 
any difference to on-disk representation of FSFS; can we please affirm 
that this will continue to be so.

== What is this instance-id? ==

(A brief summary for those who, like me, didn't know what this is.)

In Subversion 1.9 we introduced a repository instance-id in FSFS f7. It 
is stored as a second line in the "db/uuid" file. The log message tries 
to explain why:

   https://svn.apache.org/r1618138

Basically it is to disambiguate some potentially shared data in two 
svn_fs_t objects opened to repositories that have the same (primary) 
repository UUID. I am still not clear exactly what shared data it is 
used for and among which processes that data can be shared.

The log message also mentions some scenarios where having different 
instance-ids is important (if they have the same primary UUID). Three of 
these that I would like to mention here are:

   * during "svnadmin hotcopy repo1 repo2"
   * during "svnadmin freeze repo1 (svnadmin freeze repo2 (...))"
   * serving repo1 and repo2 from the same Apache httpd instance
     (in some configurations)

The second email thread linked from that log message contains most of 
the interesting discussion:

   http://svn.haxx.se/dev/archive-2014-08/0093.shtml

== Why it matters ==

WD's Svn Multisite Plus (MSP) replicates and synchronizes Subversion 
repositories, using rsync initially, then through their own 
synchronization software. Until now those replicas are bit-for-bit 
identical, and consistency checking has included checking that 
repositories remain bit-for-bit identical.

I'm aware that we don't guarantee a repository will be bitwise 
predictable (and thus two instances remain bit-for-bit identical) when 
written to. But it has been, under these conditions, and this has been 
useful.

Replicas are generally kept on physically separate servers, and served 
by separate Apache httpd instances. Two replicas are never accessed 
together by the same process, in normal use. It is unlikely but 
conceivable that an administrator might encounter one of the scenarios 
where the instance-id matters.

WANdisco asked me to advise on what to do. It seems the correct thing to 
do with "instance-id" is to make that field deliberately different on 
each instance of the repository. In consequence the consistency check 
will have to be made to ignore differences in the instance-id line in 
the "db/uuid" file.

== Question ==

WANdisco would like to know that there will not be differences in the 
repository on-disk data due to differences in instance-id (other than 
the "db/uuid" itself, of course). I suggest we are talking about the 
lifetime of FSFS format 7; of course the features of a future format are 
unknown.

Can I tell them that that is the expectation, and we won't change that 
situation without a good reason?

- Julian


Re: FSFS instance-id and on-disk representation

Posted by Doug Robinson <do...@wandisco.com>.
Evgeny:

Thank you.

Doug

On Thu, Feb 16, 2017 at 4:40 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Julian Foad <ju...@apache.org> writes:
>
> > == Question ==
> >
> > WANdisco would like to know that there will not be differences in the
> > repository on-disk data due to differences in instance-id (other than the
> > "db/uuid" itself, of course). I suggest we are talking about the
> lifetime of
> > FSFS format 7; of course the features of a future format are unknown.
> >
> > Can I tell them that that is the expectation, and we won't change that
> > situation without a good reason?
>
> The instance ID was added to handle a case when two repositories with
> the same UUID (say, one was hotcopied or dump/loaded from another) are
> opened within a single process.
>
> Without the instance ID, these repositories share internal data that should
> not be shared, such as the transaction list and mutexes.  This can result
> in various types of errors or deadlocks.  An instance ID makes it possible
> to distinguish the internal data for such near-duplicate repositories, and
> is not used anywhere else.
>
> Answering the question, it is safe to assume that an instance ID doesn't
> change what gets written to the disk (apart from the 'db/uuid' contents),
> and that this will not change in format 7.
>
> Hope this helps :)
>
>
> Regards,
> Evgeny Kotkov
>



-- 
*DOUGLAS B. ROBINSON* SENIOR PRODUCT MANAGER

*T *925-396-1125
*E* doug.robinson@wandisco.com

*www.wandisco.com <http://www.wandisco.com/>*

-- 


Learn how WANdisco Fusion solves Hadoop data protection and scalability 
challenges <http://www.wandisco.com/hadoop/wd-fusion>

Listed on the London Stock Exchange: WAND 
<http://www.bloomberg.com/quote/WAND:LN>

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE 
PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. 
 If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone.  Any 
distribution, use or copying of this e-mail or the information it contains 
by other than an intended recipient is unauthorized.  The views and 
opinions expressed in this e-mail message are the author's own and may not 
reflect the views and opinions of WANdisco, unless the author is authorized 
by WANdisco to express such views or opinions on its behalf.  All email 
sent to or from this address is subject to electronic storage and review by 
WANdisco.  Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.

Re: FSFS instance-id and on-disk representation

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Julian Foad <ju...@apache.org> writes:

> == Question ==
>
> WANdisco would like to know that there will not be differences in the
> repository on-disk data due to differences in instance-id (other than the
> "db/uuid" itself, of course). I suggest we are talking about the lifetime of
> FSFS format 7; of course the features of a future format are unknown.
>
> Can I tell them that that is the expectation, and we won't change that
> situation without a good reason?

The instance ID was added to handle a case when two repositories with
the same UUID (say, one was hotcopied or dump/loaded from another) are
opened within a single process.

Without the instance ID, these repositories share internal data that should
not be shared, such as the transaction list and mutexes.  This can result
in various types of errors or deadlocks.  An instance ID makes it possible
to distinguish the internal data for such near-duplicate repositories, and
is not used anywhere else.

Answering the question, it is safe to assume that an instance ID doesn't
change what gets written to the disk (apart from the 'db/uuid' contents),
and that this will not change in format 7.

Hope this helps :)


Regards,
Evgeny Kotkov