You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Alexander Sabourenkov <sc...@lxnt.info> on 2003/04/02 00:06:49 UTC

Re: svndumpfilter - rfc.

> Philip Martin wrote:

> 
> Will be ready to commit in several hours.
> I'll try to write some regression tests after that. Test suite is a 
> whole new chunk of code to learn.

Unfortunately, some outside issues kicked me out of development for some 
time.

As of now I think the patch is ready. Please find it attached, and also
at http://lxnt.info/sdf.patch

It is against r5518.

I implemented a list of dropped nodes, and any node that has 
copyfrom-path in this list gets dropped too.

I also made revision renumbering optional and made sure that 
copyfrom-rev headers of nodes get rewritten according to revision 
renumbering history.


Issues remaining:

The whole deal with dropping copy sources and targets remains untested, 
apart from checking if it would load w/o errors. While doing such tests 
I discovered that dumpfile parser wasn't picky about incoming data - I 
accidentally changed all Node-copyfrom-rev headers to Revision-number 
headers and it bombed out only when it failed to find a copyfrom-path, 
about 70 revisions since first copy in dump.

If a node has a copyfrom-rev same as the revision of some dropped node, 
and the former happens to appear before the latter, it won't get dropped.

Filtered stream will have SVN-fs-dump-format-version set to the current 
version the binary was compiled with.

I'm quite at a loss what to do with remove_node_props entry in vtable. 
Haven't found a single clue in libsvn_repos/dump.c and load.c

--

./lxnt





Re: svndumpfilter - rfc.

Posted by Greg Stein <gs...@lyra.org>.
On Fri, Apr 11, 2003 at 10:44:15AM -0400, Paul Lussier wrote:
> In a message dated: 11 Apr 2003 09:32:27 CDT
> cmpilato@collab.net said:
> >Paul Lussier <pl...@lanminds.com> writes:
> >
> >>   Greg> It is a frickin' text file with quite easy-to-handle syntax
> >>   Greg> and semantics.
> >> 
> >> IMO, this screams for a perl solution :)
> >
> >The man quotes Greg Stein, and then says he wants Perl.  All I have to
> >say is, I'm shielding my eyes from the carnage now.

*laf*

> I have nothing against Python, other than I don't know it, and as 
> yet, I've not seen a reason to learn yet another scripting language.
> Perl was specifically designed with text munging in mind, and since 
> that's what I spend most of my life as a sysadmin doing, I'm pretty 
> good at it.

Oh, by all means. We have a number of Perl scripts in our tools/ area, and
it would be nice to add one for filtering. As Alexander said, the
svndumpfilter is a very fine tool, and does its work quite well. I just tend
to think that a Perl/Python script will enable new kinds of features and
functionality that we haven't thought about and/or are unwilling to invest
time in at this point.

Regarding Perl vs Python. Yes, Perl is great at text processing. Personally,
I think it falls over once you start to manipulate data structures. In this
case, you're going to parse the dumpfile into structures, then manipulate
those in some logical way. I'd choose Python for that, but if you can get
Perl to work... more power to ya :-)

> If you don't want a Perl solution, that's fine.  I'm simply offering 

Oh no... we'd go ahead and add that to our tools/ area.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Paul Lussier <pl...@lanminds.com>.
In a message dated: 11 Apr 2003 09:32:27 CDT
cmpilato@collab.net said:

>Paul Lussier <pl...@lanminds.com> writes:
>
>>   Greg> It is a frickin' text file with quite easy-to-handle syntax
>>   Greg> and semantics.
>> 
>> IMO, this screams for a perl solution :)
>
>The man quotes Greg Stein, and then says he wants Perl.  All I have to
>say is, I'm shielding my eyes from the carnage now.

I have nothing against Python, other than I don't know it, and as 
yet, I've not seen a reason to learn yet another scripting language.
Perl was specifically designed with text munging in mind, and since 
that's what I spend most of my life as a sysadmin doing, I'm pretty 
good at it.

If you don't want a Perl solution, that's fine.  I'm simply offering 
to help with something that finally seems to match my programming 
capabilities.  I take no offense if decline the offer, since that 
just means I get sit around here and do nothing some more ;)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

	It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

	 If you're not having fun, you're not doing it right!



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by cm...@collab.net.
Paul Lussier <pl...@lanminds.com> writes:

>   Greg> It is a frickin' text file with quite easy-to-handle syntax
>   Greg> and semantics.
> 
> IMO, this screams for a perl solution :)

The man quotes Greg Stein, and then says he wants Perl.  All I have to
say is, I'm shielding my eyes from the carnage now.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Paul Lussier <pl...@lanminds.com>.
>>>>> On Wed, 9 Apr 2003, "Greg" == Greg Stein wrote:

  Greg> Why is it a C program rather than a script? :-)

  Greg> this *really* begs to be scripted.


>>>>> On Thu, 10 Apr 2003, "Greg" == Greg Stein wrote:

  Greg> I just think that we'll end up with some scripting tools in
  Greg> the long run. *shrug*

I haven't been following this thread too closely, so I'm not 
completely  sure what svndumpfilter's purpose is, however, if 
someone's looking for a scripted filter, and doesn't mind perl, I'll 
gladly take a stab at it.  And since, as Greg put it:

  Greg> It is a frickin' text file with quite easy-to-handle syntax
  Greg> and semantics.

IMO, this screams for a perl solution :)
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

	It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

	 If you're not having fun, you're not doing it right!



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Greg Stein <gs...@lyra.org>.
On Thu, Apr 10, 2003 at 04:48:50PM -0500, Ben Collins-Sussman wrote:
> Greg Stein <gs...@lyra.org> writes:
> 
> > > I guess you mean a script using the Swig libsvn_repos bindings?
> > > Perhaps using Python?  I suppose that would be possible--but would it
> > > involve writing a (Swig? Python?) interface to svn_repos_parse_fns_t?
> > 
> > Not at all. It is a frickin' text file with quite easy-to-handle syntax and
> > semantics.
>...
> 
> I wish you had brought this up before he had started coding.  We all
> knew he was going to write svndumpfilter in C... I even encouraged him
> to use svn_repos_parse_fns_t.  :-(

The work is not invalid. I just think there was a better way to build it,
and a better way to create something with more flexibility.

But I *do* believe having the C code is better than having nothing. (and I
certainly didn't have time to change that "nothing" into a (Python) script)
Alexander's work hasn't gone to waste. I just think that we'll end up with
some scripting tools in the long run. *shrug*

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Alexander Sabourenkov <sc...@lxnt.info>.
Ben Collins-Sussman wrote:
> Greg Stein <gs...@lyra.org> writes:
> 

[...]

>>Not at all. It is a frickin' text file with quite easy-to-handle syntax and
>>semantics.
>>
>>"Hey, I've got this hammer [called svn_respose_parse_fns_t], so why don't I
>> use that for everything?"
> 
> 
> I wish you had brought this up before he had started coding.  We all
> knew he was going to write svndumpfilter in C... I even encouraged him
> to use svn_repos_parse_fns_t.  :-(
> 

Please don't feel sorry for me. Getting to know subversion internals and 
coding style was extremely helpful experience for me. I've got the 
functionality that I needed, everyone else got most basic needs covered, and 
I'm happy with that.

At least it won't silently fall out of sync with dumpfile format.

-- 

./lxnt


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Ben Collins-Sussman <su...@collab.net>.
Greg Stein <gs...@lyra.org> writes:

> > I guess you mean a script using the Swig libsvn_repos bindings?
> > Perhaps using Python?  I suppose that would be possible--but would it
> > involve writing a (Swig? Python?) interface to svn_repos_parse_fns_t?
> 
> Not at all. It is a frickin' text file with quite easy-to-handle syntax and
> semantics.
> 
> "Hey, I've got this hammer [called svn_respose_parse_fns_t], so why don't I
>  use that for everything?"

I wish you had brought this up before he had started coding.  We all
knew he was going to write svndumpfilter in C... I even encouraged him
to use svn_repos_parse_fns_t.  :-(


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Apr 09, 2003 at 11:48:53PM +0100, Philip Martin wrote:
> Greg Stein <gs...@lyra.org> writes:
>...
> > I say that only somewhat in jest -- this *really* begs to be scripted. The
> > kinds of changes that a person may want to make to a dump file seem a lot
> > more expansive than the couple options provided.
> 
> I guess you mean a script using the Swig libsvn_repos bindings?
> Perhaps using Python?  I suppose that would be possible--but would it
> involve writing a (Swig? Python?) interface to svn_repos_parse_fns_t?

Not at all. It is a frickin' text file with quite easy-to-handle syntax and
semantics.

"Hey, I've got this hammer [called svn_respose_parse_fns_t], so why don't I
 use that for everything?"


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Philip Martin <ph...@codematters.co.uk>.
Greg Stein <gs...@lyra.org> writes:

> On Wed, Apr 09, 2003 at 09:49:27PM +0100, Philip Martin wrote:
> >...
> > For those of you not paying attention, this patch introduces
> > svndumpfilter, a program to filter repository dump files.  It can be
> > used like this
> 
> Why is it a C program rather than a script? :-)
> 
> I say that only somewhat in jest -- this *really* begs to be scripted. The
> kinds of changes that a person may want to make to a dump file seem a lot
> more expansive than the couple options provided.

I guess you mean a script using the Swig libsvn_repos bindings?
Perhaps using Python?  I suppose that would be possible--but would it
involve writing a (Swig? Python?) interface to svn_repos_parse_fns_t?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Michael Price <mp...@atl.lmco.com>.
Paul Lussier wrote:
> In a message dated: 09 Apr 2003 21:08:20 CDT Ben Collins-Sussman said:
 >
>>(On the other hand, we *did* make it RFC822 compatible.  So python
>>probably has a module to parse it already.  :-) )
> 
> Maybe this is a stupid question, maybe not, but why RFC822 and not 
> RFC2822, which has since superceded RFC822?

Relevant links for interested parties who don't already know where to 
find RFC's.

   http://www.faqs.org/ftp/rfc/rfc2822.txt
   http://www.faqs.org/ftp/rfc/rfc822.txt

Main differences as far as I can tell is that the newer standard is 
stricter in what it accepts and that there are several obsolete tokens 
which are required to still be interpreted but can no longer be generated.

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Greg Stein <gs...@lyra.org>.
Fine. If people want to be pedantic, then we'll say "RFC822-ish".

-g

On Thu, Apr 10, 2003 at 12:00:36PM -0400, Greg Hudson wrote:
> On Thu, 2003-04-10 at 11:24, Ben Collins-Sussman wrote:
> > My understanding of RFC 822 parser was this:
> 
> Sadly, people who write mail user agents often have a similar level of
> understanding.
> 
> Our dump format looks a little bit like RFC 822.  It probably isn't
> precisely compatible in either direction.  That's just fine with me; I
> don't think RFC 822 is a very good standard once you get into the
> details.
> 
> >   * notice if one of the headers is "Content-length:"
> 
> Content-Length is from HTTP, not RFC 822 or 2822.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Greg Hudson <gh...@MIT.EDU>.
On Thu, 2003-04-10 at 11:24, Ben Collins-Sussman wrote:
> My understanding of RFC 822 parser was this:

Sadly, people who write mail user agents often have a similar level of
understanding.

Our dump format looks a little bit like RFC 822.  It probably isn't
precisely compatible in either direction.  That's just fine with me; I
don't think RFC 822 is a very good standard once you get into the
details.

>   * notice if one of the headers is "Content-length:"

Content-Length is from HTTP, not RFC 822 or 2822.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Michael Price <mp...@atl.lmco.com>.
Ben Collins-Sussman wrote:
> My understanding of RFC 822 parser was this:
> 
>   * read a bunch of headers that start with colon (":")
>   * notice if one of the headers is "Content-length:"

Content-length: isn't actually listed in the spec. You are free to use 
it as an optional-field though.

>   * look for a blank line.
>   * read 'Content-length' bytes of body

In the standard, all text after the "message" is the "body" and no 
length is specified. Nothing prevents people from using Content-length: 
to specify the body size though.

>   * return the headers and body to the user.
>   * repeat

That is basically it. One paragraph worth noting though is this one:

   "The only required header fields are the origination date field and
    the originator address field(s).  All other header fields are
    syntactically optional.  More information is contained in the table
    following this definition."

This is also true for 822 so you can't "really" claim [2]822 conformance 
without an origination date and originator address. I haven't looked at 
how subversion uses [2]822 so it may already do this.

Just letting people know.

-- 
Michael H. Price II       Member of the Engineering Staff
Lockheed Martin -- Advanced Technology Laboratories
3 Executive Campus, 6th Floor, Cherry Hill, NJ  08002
856-792-9746, fax 856-792-9925, email mprice@atl.lmco.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Ben Collins-Sussman <su...@collab.net>.
Paul Lussier <pl...@lanminds.com> writes:

> Everyone knows about 822, but only schmucks like me who love arcane 
> technical trivia know about obsoleting RFCs like 2822 :)
> 
> As Michael Price pointed out, not a lot has changed.  2822 is a 
> little stricter in some areas.  Achieving 822 compliance solves the 
> 80/20 rule for you, so don't sweat it :)

My understanding of RFC 822 parser was this:

  * read a bunch of headers that start with colon (":")
  * notice if one of the headers is "Content-length:"
  * look for a blank line.
  * read 'Content-length' bytes of body
  * return the headers and body to the user.
  * repeat

:-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Paul Lussier <pl...@lanminds.com>.
In a message dated: 10 Apr 2003 10:05:48 CDT
Ben Collins-Sussman said:

>Paul Lussier <pl...@lanminds.com> writes:
>
>> In a message dated: 09 Apr 2003 21:08:20 CDT
>> Ben Collins-Sussman said:
>> 
>> >(On the other hand, we *did* make it RFC822 compatible.  So python
>> >probably has a module to parse it already.  :-) )
>> 
>> Maybe this is a stupid question, maybe not, but why RFC822 and not 
>> RFC2822, which has since superceded RFC822?  Was is just easier to 
>> implement 822 compliance, and the assumption was that this should 
>> satisfy the 80/20 rule?  Or was it just an oversight and no one 
>> remembered 2822 (which wouldn't surprise me, since it's no where near 
>> as famous as 822 :)
>
>RFC 2822?  Huh?
>
>/me looks at gstein quizzically...

Yep, that's about what I thought :)

Everyone knows about 822, but only schmucks like me who love arcane 
technical trivia know about obsoleting RFCs like 2822 :)

As Michael Price pointed out, not a lot has changed.  2822 is a 
little stricter in some areas.  Achieving 822 compliance solves the 
80/20 rule for you, so don't sweat it :)
-- 

Seeya,
Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Ben Collins-Sussman <su...@collab.net>.
Paul Lussier <pl...@lanminds.com> writes:

> In a message dated: 09 Apr 2003 21:08:20 CDT
> Ben Collins-Sussman said:
> 
> >(On the other hand, we *did* make it RFC822 compatible.  So python
> >probably has a module to parse it already.  :-) )
> 
> Maybe this is a stupid question, maybe not, but why RFC822 and not 
> RFC2822, which has since superceded RFC822?  Was is just easier to 
> implement 822 compliance, and the assumption was that this should 
> satisfy the 80/20 rule?  Or was it just an oversight and no one 
> remembered 2822 (which wouldn't surprise me, since it's no where near 
> as famous as 822 :)

RFC 2822?  Huh?

/me looks at gstein quizzically...



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Paul Lussier <pl...@lanminds.com>.
In a message dated: 09 Apr 2003 21:08:20 CDT
Ben Collins-Sussman said:

>(On the other hand, we *did* make it RFC822 compatible.  So python
>probably has a module to parse it already.  :-) )

Maybe this is a stupid question, maybe not, but why RFC822 and not 
RFC2822, which has since superceded RFC822?  Was is just easier to 
implement 822 compliance, and the assumption was that this should 
satisfy the 80/20 rule?  Or was it just an oversight and no one 
remembered 2822 (which wouldn't surprise me, since it's no where near 
as famous as 822 :)

I'm just curious, I have no special reason for demanding/requesting 
2822 compliance other than it is the current/superceding RFC which 
obsoletes 822.
-- 

Seeya,
Paul
--
Key fingerprint = 1660 FECC 5D21 D286 F853  E808 BB07 9239 53F1 28EE

	It may look like I'm just sitting here doing nothing,
   but I'm really actively waiting for all my problems to go away.

	 If you're not having fun, you're not doing it right!



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Ben Collins-Sussman <su...@collab.net>.
Greg Stein <gs...@lyra.org> writes:

> On Wed, Apr 09, 2003 at 09:49:27PM +0100, Philip Martin wrote:
> >...
> > For those of you not paying attention, this patch introduces
> > svndumpfilter, a program to filter repository dump files.  It can be
> > used like this
> 
> Why is it a C program rather than a script? :-)
> 
> I say that only somewhat in jest -- this *really* begs to be scripted. The
> kinds of changes that a person may want to make to a dump file seem a lot
> more expansive than the couple options provided.

Well, for one, our dumpfile parser is abstracted in C.  Just hand it a
vtable.  I'd rather not see people reinvent their own parser in a
scripting language.

(On the other hand, we *did* make it RFC822 compatible.  So python
probably has a module to parse it already.  :-) )

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Greg Stein <gs...@lyra.org>.
On Wed, Apr 09, 2003 at 09:49:27PM +0100, Philip Martin wrote:
>...
> For those of you not paying attention, this patch introduces
> svndumpfilter, a program to filter repository dump files.  It can be
> used like this

Why is it a C program rather than a script? :-)

I say that only somewhat in jest -- this *really* begs to be scripted. The
kinds of changes that a person may want to make to a dump file seem a lot
more expansive than the couple options provided.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svndumpfilter - rfc.

Posted by Philip Martin <ph...@codematters.co.uk>.
Alexander Sabourenkov <sc...@lxnt.info> writes:

> As of now I think the patch is ready. Please find it attached, and also
> at http://lxnt.info/sdf.patch
> 
> It is against r5518.
> 
> I implemented a list of dropped nodes, and any node that has
> copyfrom-path in this list gets dropped too.

I committed this patch in r5596.  I made a few modifications so you
will get a conflict if you update a patched working copy.  For
reference I changed indentation/whitespace for consistency through the
file and to match the HACKING guidelines, I removed some unnecessary
casts, I tweaked some printf formats, and I changed a the
svn_string_create into an apr_pstrdup.

For those of you not paying attention, this patch introduces
svndumpfilter, a program to filter repository dump files.  It can be
used like this

$ svnadmin dump repo1 | svndumpfilter [args] | svnadmin load repo2

to remove parts of a repository during a dump/load cycle.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org