You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2009/04/02 01:46:02 UTC

representational problem in the schema

Hey Hyrum,

I'm trying to work out the discrepancy in how we handle the COPIED
flag to satisfy both revert/17 and update/54. I figured out an
approach, coded something up, and it works. But then I started
thinking...

I should probably stop doing that. :-P

Consider the following BASE tree:

  A
  A/B
  A/B/foo
  A/C

Next, I delete the B directory. So we create two rows in WORKING:

  A/B, presence=not-present
  A/B/foo, presence=not-present

All good. Now, I copy A/C@rev to A/B. Now we have something like:

  A/B, copyfrom=blah
  A/B/foo, presence=not-present

Now. Just given the above data, talk to me about A/B/foo.

<pause>

Well.... there are two possibilities:

1) A/B/foo represents the deletion of the BASE node
2) A/B/foo represents a deletion of A/C/foo post-copy


I've been thinking about the best way to represent this in our schema.
I've come up with this approach:

* leave it untouched, but interpret case (2) as distinguished using
copyfrom records. we apply copyfrom info to the root of the deletion
from the copied subtree. if no copyfrom info exists on the deletion
root, then it represents a deletion from BASE

But this feels a bit fragile. Something could come along and overwrite
that node with other data, much like we overwrote A/B.

So. Something fully orthogonal. New column, or record "deletion roots"
or something...

After consideration, I'm thinking a new presence value of
"base-deleted". In the above two possibilities:

1) presence="base-deleted" means A/B/foo refers to the BASE
2) presence="not-present" means A/B/foo is a deletion of A/C/foo post-copy

Everywhere we talk about not-present, we'd have to extend a bit, and
probably adjust scan_working and crap.

Oh, and since A/B "shadows" a BASE node, then we implicitly know that
a deletion happened first, before <whatever> brought A/B into the
WORKING tree there (a replacement).

Thoughts?

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1512688

Re: representational problem in the schema

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Apr 2, 2009 at 05:35, Hyrum K. Wright
<hy...@mail.utexas.edu> wrote:
> On Apr 1, 2009, at 8:46 PM, Greg Stein wrote:
>
>> Hey Hyrum,
>>
>> I'm trying to work out the discrepancy in how we handle the COPIED
>> flag to satisfy both revert/17 and update/54. I figured out an
>> approach, coded something up, and it works. But then I started
>> thinking...
>>
>> I should probably stop doing that. :-P
>
> Yeah.  I was thinking today that it's amazing that we're on the brink of
> getting >1100 test cases working with a *completely new* entries storage
> paradigm.  That's pretty crazy.

hehe... yeah :-)

>  But I'm scared about the umpteen billion of
> other weird cases which we don't test, and at some point, the workaround
> hacks will bite us.
>
> Then again, we can only go so far to ensure perfect compat.  We can't, nor
> should we be perfectly bug compatible.

Agreed. I think if we get the 1100 tests to pass, plus the extra test
for the read_entries() paths that we've talked about... then we've
done well more than I think anybody would ask for. And I think that's
just about Right :-)

>...
>> After consideration, I'm thinking a new presence value of
>> "base-deleted". In the above two possibilities:
>>
>> 1) presence="base-deleted" means A/B/foo refers to the BASE
>> 2) presence="not-present" means A/B/foo is a deletion of A/C/foo post-copy
>>
>> Everywhere we talk about not-present, we'd have to extend a bit, and
>> probably adjust scan_working and crap.
>
> This sounds reasonable.  Are there additional scenarios where we're going to
> have to think about even more presence values, or does this just about cover
> it?

Oh, this should be it. Recognize that our schema fully supports our
client-side operation today. That's why we've been able to come so
far.

"should be" isn't "definitely" though. This base-deleted issue has
arisen because we're near the boundary of simply storing
representational stuff and that of *intent*. That is, what to do at
commit time. When we revamp the commit logic, then we may discover
other situations. We may also find a schema change is in order to
better support libsvn_client operations.

>> Oh, and since A/B "shadows" a BASE node, then we implicitly know that
>> a deletion happened first, before <whatever> brought A/B into the
>> WORKING tree there (a replacement).
>
> Sure, but will this always be the case, or are there cases where the we
> could get into a situation where the parent isn't shadowing a BASE node, and
> therefore can't make this implicit assumption?  I can't think of any.

Every BASE node needs a parent. Therefore, if a WORKING node has a
parent WORKING node, then it, too, shadows a BASE node. We should be
fine there.

Okay. I'm going to start non the base-deleted presence value, then
return to the read_entries() testing.

(there is also a read_entries() simplification that I'd like to do;
left a comment in there in my last commit)

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1517275


Re: representational problem in the schema

Posted by "Hyrum K. Wright" <hy...@mail.utexas.edu>.
On Apr 1, 2009, at 8:46 PM, Greg Stein wrote:

> Hey Hyrum,
>
> I'm trying to work out the discrepancy in how we handle the COPIED
> flag to satisfy both revert/17 and update/54. I figured out an
> approach, coded something up, and it works. But then I started
> thinking...
>
> I should probably stop doing that. :-P

Yeah.  I was thinking today that it's amazing that we're on the brink  
of getting >1100 test cases working with a *completely new* entries  
storage paradigm.  That's pretty crazy.  But I'm scared about the  
umpteen billion of other weird cases which we don't test, and at some  
point, the workaround hacks will bite us.

Then again, we can only go so far to ensure perfect compat.  We can't,  
nor should we be perfectly bug compatible.

> Consider the following BASE tree:
>
>  A
>  A/B
>  A/B/foo
>  A/C
>
> Next, I delete the B directory. So we create two rows in WORKING:
>
>  A/B, presence=not-present
>  A/B/foo, presence=not-present
>
> All good. Now, I copy A/C@rev to A/B. Now we have something like:
>
>  A/B, copyfrom=blah
>  A/B/foo, presence=not-present
>
> Now. Just given the above data, talk to me about A/B/foo.
>
> <pause>

*crickets*

I see what the problem could be, and it's exactly what you describe  
below.

> Well.... there are two possibilities:
>
> 1) A/B/foo represents the deletion of the BASE node
> 2) A/B/foo represents a deletion of A/C/foo post-copy
>
>
> I've been thinking about the best way to represent this in our schema.
> I've come up with this approach:
>
> * leave it untouched, but interpret case (2) as distinguished using
> copyfrom records. we apply copyfrom info to the root of the deletion
> from the copied subtree. if no copyfrom info exists on the deletion
> root, then it represents a deletion from BASE
>
> But this feels a bit fragile. Something could come along and overwrite
> that node with other data, much like we overwrote A/B.
>
> So. Something fully orthogonal. New column, or record "deletion roots"
> or something...
>
> After consideration, I'm thinking a new presence value of
> "base-deleted". In the above two possibilities:
>
> 1) presence="base-deleted" means A/B/foo refers to the BASE
> 2) presence="not-present" means A/B/foo is a deletion of A/C/foo  
> post-copy
>
> Everywhere we talk about not-present, we'd have to extend a bit, and
> probably adjust scan_working and crap.

This sounds reasonable.  Are there additional scenarios where we're  
going to have to think about even more presence values, or does this  
just about cover it?

> Oh, and since A/B "shadows" a BASE node, then we implicitly know that
> a deletion happened first, before <whatever> brought A/B into the
> WORKING tree there (a replacement).

Sure, but will this always be the case, or are there cases where the  
we could get into a situation where the parent isn't shadowing a BASE  
node, and therefore can't make this implicit assumption?  I can't  
think of any.

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1513479

Re: representational problem in the schema

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Apr 02, 2009 at 03:46:02AM +0200, Greg Stein wrote:
> Consider the following BASE tree:
> 
>   A
>   A/B
>   A/B/foo
>   A/C
> 
> Next, I delete the B directory. So we create two rows in WORKING:
> 
>   A/B, presence=not-present
>   A/B/foo, presence=not-present
> 
> All good. Now, I copy A/C@rev to A/B. Now we have something like:
> 
>   A/B, copyfrom=blah
>   A/B/foo, presence=not-present
> 
> Now. Just given the above data, talk to me about A/B/foo.
> 
> <pause>
> 
> Well.... there are two possibilities:
> 
> 1) A/B/foo represents the deletion of the BASE node
> 2) A/B/foo represents a deletion of A/C/foo post-copy

This was a big design problem in wc-1. Erik has mentioned it
to me a few times. Thanks for fixing it. And as far as I can
tell your approach sounds sane. And yes, please try to avoid
implicit meanings of fields, i.e. don't make fields in the DB
carry additional meaning if they occur together in some combination.
It's too confusing for people coming late to the party.
The easier to understand the DB schema is, the better.

Stefan