You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2009/04/02 01:46:02 UTC
representational problem in the schema
Hey Hyrum,
I'm trying to work out the discrepancy in how we handle the COPIED
flag to satisfy both revert/17 and update/54. I figured out an
approach, coded something up, and it works. But then I started
thinking...
I should probably stop doing that. :-P
Consider the following BASE tree:
A
A/B
A/B/foo
A/C
Next, I delete the B directory. So we create two rows in WORKING:
A/B, presence=not-present
A/B/foo, presence=not-present
All good. Now, I copy A/C@rev to A/B. Now we have something like:
A/B, copyfrom=blah
A/B/foo, presence=not-present
Now. Just given the above data, talk to me about A/B/foo.
<pause>
Well.... there are two possibilities:
1) A/B/foo represents the deletion of the BASE node
2) A/B/foo represents a deletion of A/C/foo post-copy
I've been thinking about the best way to represent this in our schema.
I've come up with this approach:
* leave it untouched, but interpret case (2) as distinguished using
copyfrom records. we apply copyfrom info to the root of the deletion
from the copied subtree. if no copyfrom info exists on the deletion
root, then it represents a deletion from BASE
But this feels a bit fragile. Something could come along and overwrite
that node with other data, much like we overwrote A/B.
So. Something fully orthogonal. New column, or record "deletion roots"
or something...
After consideration, I'm thinking a new presence value of
"base-deleted". In the above two possibilities:
1) presence="base-deleted" means A/B/foo refers to the BASE
2) presence="not-present" means A/B/foo is a deletion of A/C/foo post-copy
Everywhere we talk about not-present, we'd have to extend a bit, and
probably adjust scan_working and crap.
Oh, and since A/B "shadows" a BASE node, then we implicitly know that
a deletion happened first, before <whatever> brought A/B into the
WORKING tree there (a replacement).
Thoughts?
Cheers,
-g
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1512688
Re: representational problem in the schema
Posted by Greg Stein <gs...@gmail.com>.
On Thu, Apr 2, 2009 at 05:35, Hyrum K. Wright
<hy...@mail.utexas.edu> wrote:
> On Apr 1, 2009, at 8:46 PM, Greg Stein wrote:
>
>> Hey Hyrum,
>>
>> I'm trying to work out the discrepancy in how we handle the COPIED
>> flag to satisfy both revert/17 and update/54. I figured out an
>> approach, coded something up, and it works. But then I started
>> thinking...
>>
>> I should probably stop doing that. :-P
>
> Yeah. I was thinking today that it's amazing that we're on the brink of
> getting >1100 test cases working with a *completely new* entries storage
> paradigm. That's pretty crazy.
hehe... yeah :-)
> But I'm scared about the umpteen billion of
> other weird cases which we don't test, and at some point, the workaround
> hacks will bite us.
>
> Then again, we can only go so far to ensure perfect compat. We can't, nor
> should we be perfectly bug compatible.
Agreed. I think if we get the 1100 tests to pass, plus the extra test
for the read_entries() paths that we've talked about... then we've
done well more than I think anybody would ask for. And I think that's
just about Right :-)
>...
>> After consideration, I'm thinking a new presence value of
>> "base-deleted". In the above two possibilities:
>>
>> 1) presence="base-deleted" means A/B/foo refers to the BASE
>> 2) presence="not-present" means A/B/foo is a deletion of A/C/foo post-copy
>>
>> Everywhere we talk about not-present, we'd have to extend a bit, and
>> probably adjust scan_working and crap.
>
> This sounds reasonable. Are there additional scenarios where we're going to
> have to think about even more presence values, or does this just about cover
> it?
Oh, this should be it. Recognize that our schema fully supports our
client-side operation today. That's why we've been able to come so
far.
"should be" isn't "definitely" though. This base-deleted issue has
arisen because we're near the boundary of simply storing
representational stuff and that of *intent*. That is, what to do at
commit time. When we revamp the commit logic, then we may discover
other situations. We may also find a schema change is in order to
better support libsvn_client operations.
>> Oh, and since A/B "shadows" a BASE node, then we implicitly know that
>> a deletion happened first, before <whatever> brought A/B into the
>> WORKING tree there (a replacement).
>
> Sure, but will this always be the case, or are there cases where the we
> could get into a situation where the parent isn't shadowing a BASE node, and
> therefore can't make this implicit assumption? I can't think of any.
Every BASE node needs a parent. Therefore, if a WORKING node has a
parent WORKING node, then it, too, shadows a BASE node. We should be
fine there.
Okay. I'm going to start non the base-deleted presence value, then
return to the read_entries() testing.
(there is also a read_entries() simplification that I'd like to do;
left a comment in there in my last commit)
Cheers,
-g
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1517275
Re: representational problem in the schema
Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Apr 1, 2009, at 8:46 PM, Greg Stein wrote:
> Hey Hyrum,
>
> I'm trying to work out the discrepancy in how we handle the COPIED
> flag to satisfy both revert/17 and update/54. I figured out an
> approach, coded something up, and it works. But then I started
> thinking...
>
> I should probably stop doing that. :-P
Yeah. I was thinking today that it's amazing that we're on the brink
of getting >1100 test cases working with a *completely new* entries
storage paradigm. That's pretty crazy. But I'm scared about the
umpteen billion of other weird cases which we don't test, and at some
point, the workaround hacks will bite us.
Then again, we can only go so far to ensure perfect compat. We can't,
nor should we be perfectly bug compatible.
> Consider the following BASE tree:
>
> A
> A/B
> A/B/foo
> A/C
>
> Next, I delete the B directory. So we create two rows in WORKING:
>
> A/B, presence=not-present
> A/B/foo, presence=not-present
>
> All good. Now, I copy A/C@rev to A/B. Now we have something like:
>
> A/B, copyfrom=blah
> A/B/foo, presence=not-present
>
> Now. Just given the above data, talk to me about A/B/foo.
>
> <pause>
*crickets*
I see what the problem could be, and it's exactly what you describe
below.
> Well.... there are two possibilities:
>
> 1) A/B/foo represents the deletion of the BASE node
> 2) A/B/foo represents a deletion of A/C/foo post-copy
>
>
> I've been thinking about the best way to represent this in our schema.
> I've come up with this approach:
>
> * leave it untouched, but interpret case (2) as distinguished using
> copyfrom records. we apply copyfrom info to the root of the deletion
> from the copied subtree. if no copyfrom info exists on the deletion
> root, then it represents a deletion from BASE
>
> But this feels a bit fragile. Something could come along and overwrite
> that node with other data, much like we overwrote A/B.
>
> So. Something fully orthogonal. New column, or record "deletion roots"
> or something...
>
> After consideration, I'm thinking a new presence value of
> "base-deleted". In the above two possibilities:
>
> 1) presence="base-deleted" means A/B/foo refers to the BASE
> 2) presence="not-present" means A/B/foo is a deletion of A/C/foo
> post-copy
>
> Everywhere we talk about not-present, we'd have to extend a bit, and
> probably adjust scan_working and crap.
This sounds reasonable. Are there additional scenarios where we're
going to have to think about even more presence values, or does this
just about cover it?
> Oh, and since A/B "shadows" a BASE node, then we implicitly know that
> a deletion happened first, before <whatever> brought A/B into the
> WORKING tree there (a replacement).
Sure, but will this always be the case, or are there cases where the
we could get into a situation where the parent isn't shadowing a BASE
node, and therefore can't make this implicit assumption? I can't
think of any.
-Hyrum
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1513479
Re: representational problem in the schema
Posted by Stefan Sperling <st...@elego.de>.
On Thu, Apr 02, 2009 at 03:46:02AM +0200, Greg Stein wrote:
> Consider the following BASE tree:
>
> A
> A/B
> A/B/foo
> A/C
>
> Next, I delete the B directory. So we create two rows in WORKING:
>
> A/B, presence=not-present
> A/B/foo, presence=not-present
>
> All good. Now, I copy A/C@rev to A/B. Now we have something like:
>
> A/B, copyfrom=blah
> A/B/foo, presence=not-present
>
> Now. Just given the above data, talk to me about A/B/foo.
>
> <pause>
>
> Well.... there are two possibilities:
>
> 1) A/B/foo represents the deletion of the BASE node
> 2) A/B/foo represents a deletion of A/C/foo post-copy
This was a big design problem in wc-1. Erik has mentioned it
to me a few times. Thanks for fixing it. And as far as I can
tell your approach sounds sane. And yes, please try to avoid
implicit meanings of fields, i.e. don't make fields in the DB
carry additional meaning if they occur together in some combination.
It's too confusing for people coming late to the party.
The easier to understand the DB schema is, the better.
Stefan