You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Nigel Daley <nd...@mac.com> on 2011/04/10 08:09:27 UTC

HADOOP-7106: Re-organize hadoop subversion layout

All, 

As discussed in Jan/Feb, I'd like to coordinate a date for committing the re-organization of our svn layout: https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday April 21 at 11am PDT.

- I will send out reminders leading up to that date.
- I will announce on IRC when I'm about to start the changes.
- I will run the script to make the changes.
- Ian, can you update the asf-authorization-template file and the asf-mailer.conf files at the same time?
- Owen/Todd/Jukka, can you make sure that actions needed by git users are taken care of at the same time? (what are these)

More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit

Cheers,
Nige

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Owen O'Malley <om...@apache.org>.

On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:

> As discussed in Jan/Feb, I'd like to coordinate a date for committing the re-organization of our svn layout: https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday April 21 at 11am PDT.

This is still premature.

Your patch is out of date with respect to the current branches. Please update it and upload a new version.

What changes do we want to make to the asf-authorization file? Clearly that needs to be decided before anything is done.

What changes are we making to the notifications? That also hasn't been discussed at all.

Again, has anyone talked to Jukka and the rest of the infrastructure-dev to see if there is anything that we need to do to minimize transition pain? Clearly, the notifications for git.apache.org need to be maintained, but I don't know if there are other actions that would help minimize the data loss.

Clearly any change where all of the git hashes change is massively disruptive. I suspect there is some magic you can do to keep more of the history, but I don't know the subversion-git bridge very well.

-- Owen

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

I could start this at 2pm on Friday if that suited folks better.

Nige

On Apr 19, 2011, at 10:20 PM, Todd Lipcon wrote:

> On Tue, Apr 19, 2011 at 10:02 PM, Nigel Daley <nd...@mac.com> wrote:
> 
>> I'm still planning to make this SVN change on Thursday this week.
>> 
>> Ian, Owen, Todd, note the questions I ask you below.  Can you help with
>> these on Thursday?
>> 
> 
> Unfortunately I'm out of the office most of the day on Thursday with a
> customer. I'll be available Thursday evening, though, to help with any
> cleanup/etc.
> 
> I'm currently looking into how the git mirrors are setup in Apache-land.
> 
> My guess is that there will be some disturbance to developers on Thurs
> afternoon / Friday as this gets sorted out, even if we try to plan as much
> as possible. Would it be better to do this on Friday so that we have the
> weekend to fix up broken pieces before people get to work on Monday?
> 
> -Todd
> 
> 
>> On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:
>> 
>> All,
>> 
>> As discussed in Jan/Feb, I'd like to coordinate a date for committing the
>> re-organization of our svn layout:
>> https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday
>> April 21 at 11am PDT.
>> 
>> - I will send out reminders leading up to that date.
>> - I will announce on IRC when I'm about to start the changes.
>> - I will run the script to make the changes.
>> - Ian, can you update the asf-authorization-template file and the
>> asf-mailer.conf files at the same time?
>> - Owen/Todd/Jukka, can you make sure that actions needed by git users are
>> taken care of at the same time? (what are these?)
>> 
>> More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit
>> 
>> Cheers,
>> Nige
>> 
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Ian Holsman <ia...@holsman.net>.

I'm traveling at the moment (when am I not), and probably won't have access until monday.
I'll look at the files, and see if I can do some stuff today without impacting anything.
On Apr 20, 2011, at 6:58 AM, Todd Lipcon wrote:

> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
> 
> I'm currently looking into how the git mirrors are setup in Apache-land.
> 
> Git-wise, I think we have two options:
> 
> Option 1)
> - Create a new git mirror for the new hadoop/ tree. This will have no history.
> - On the Apache side, fetch the split-project git mirrors into the combined git mirror as branches - eg hadoop-hdfs.git:trunk becomes a branch named something like pre-HADOOP-7106/hdfs/trunk. Thus, when any user fetches, he'll get all the git objects from "prehistory" as well without having to add separate remotes.
> - Add a script or README file explaining how to set up git grafts on the combined hadoop.git so that the new combination branch "foo" looks like a merge of pre-HADOOP-7106/{hdfs,common,mapred}/foo. Since git grafts are local constructs, each git user would have to run this script once after checking out the git tree, after which the history would be "healed"
> 
> Pros:
>  - all existing sha1s stay the same.
>  - Any local branches people might have for works in progress should continue to refer to proper SHA1s and should rebase relatively easily onto the combined trunk
>  - Should be reasonably simple to implement
> 
> Cons:
>  - users have to run a script upon checkout in order to graft back together history
> 
> Option 2)
> - Use git-filter-branch on the split repos to rewrite them as if they always took place in their new subdirectories.
> - Fetch these repos into the merged repo
> - Set up grafts in the merged repo
> - Run git-filter-branch --all in the merged repo, which will make the grafts permanent
> - May have to run git-filter-branch to rewrite some of the git-svn-info: commit messages to trick git-svn.
> 
> This option basically rewrites history so that it looks like the original project split did what we're planning to do now.
> 
> Pros:
>  - we have a single cohesive git repo with no need to have users set up grafts
> 
> Cons:
>  - all of our SHA1s between the original split and now would change (making it harder to rebase local branches for example)
>  - way more opportunity for error, I think.
> 
> I'm leaning towards option 1 above, and happy to write the script which installs the grafts into the user's local repo.
> 
> -Todd
> 
>  
> On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:
> 
>> All, 
>> 
>> As discussed in Jan/Feb, I'd like to coordinate a date for committing the re-organization of our svn layout: https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday April 21 at 11am PDT.
>> 
>> - I will send out reminders leading up to that date.
>> - I will announce on IRC when I'm about to start the changes.
>> - I will run the script to make the changes.
>> - Ian, can you update the asf-authorization-template file and the asf-mailer.conf files at the same time?
>> - Owen/Todd/Jukka, can you make sure that actions needed by git users are taken care of at the same time? (what are these?)
>> 
>> More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit
>> 
>> Cheers,
>> Nige
> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

--
Ian Holsman
Ian@Holsman.net
PH: +1-703 879-3128 AOLIM: ianholsman Skype:iholsman

To know recursion, you must first know recursion.

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Ian Holsman <ia...@holsman.net>.

the other issue is when you create a branch, as you would need to update the SVN/mail configs to add each branch independently.. unless you can do wildcard matching in the config files.. infra??? can you do this?

On Apr 20, 2011, at 8:26 PM, Konstantin Boudnik wrote:

> On Wed, Apr 20, 2011 at 08:00, Owen O'Malley <om...@apache.org> wrote:
>> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
>> 
>> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
>> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
>> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
> 
> This seems like adding an insult to injury by creating an artificial
> SVN layout and moving what deemed to be an unrelated components into
> common's namespace.
> 
> The original proposal seems much more straight forward.
> 
> Cos
> 
>> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
>> 
>> -- Owen
> 
>> 

--
Ian Holsman
Ian@Holsman.net
PH: +1-703 879-3128 AOLIM: ianholsman Skype:iholsman

“If we knew what it was we were doing, it would not be called research, would it?”
 – Albert Einstein

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Ian Holsman <ia...@holsman.net>.

the other issue is when you create a branch, as you would need to update the SVN/mail configs to add each branch independently.. unless you can do wildcard matching in the config files.. infra??? can you do this?

On Apr 20, 2011, at 8:26 PM, Konstantin Boudnik wrote:

> On Wed, Apr 20, 2011 at 08:00, Owen O'Malley <om...@apache.org> wrote:
>> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
>> 
>> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
>> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
>> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
> 
> This seems like adding an insult to injury by creating an artificial
> SVN layout and moving what deemed to be an unrelated components into
> common's namespace.
> 
> The original proposal seems much more straight forward.
> 
> Cos
> 
>> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
>> 
>> -- Owen
> 
>> 

--
Ian Holsman
Ian@Holsman.net
PH: +1-703 879-3128 AOLIM: ianholsman Skype:iholsman

“If we knew what it was we were doing, it would not be called research, would it?”
 – Albert Einstein

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Konstantin Boudnik <co...@boudnik.org>.

On Wed, Apr 20, 2011 at 08:00, Owen O'Malley <om...@apache.org> wrote:
> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
>
> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce

This seems like adding an insult to injury by creating an artificial
SVN layout and moving what deemed to be an unrelated components into
common's namespace.

The original proposal seems much more straight forward.

Cos

> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
>
> -- Owen

>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Paul Davis <pa...@gmail.com>.

> From the project split, subversion was able to track the history across the subversion moves between projects, but not git.
>
> Four questions:
>  1. Is there anything we can do to minimize the history loss in git?

Glancing at the Git history on GitHub for Hadoop HDFS my guess is that
the mirror script is just pulling from $prefix/hdfs. I'm not entirely
certain how promiscuous it'll look for movements from elsewhere in the
svn repo for history outside the root URL.

If you're really wanting to make sure to keep the history in Git
intact my suggestion would be to setup a temporary svn server locally
and test our mirroring scripts against the commands you intend to run.

Paul Davis

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Paul Davis <pa...@gmail.com>.

You might want to look around at the options for imports more. I know
when I was doing git-svn clones it was doing something stupid like 1
rev/s until I found a --log-window setting. Not sure if that's from
the git layer or the svn layer but it made orders of magnitude
difference.

On Fri, May 6, 2011 at 4:04 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hey folks,
>
> FYI I'm in the process of loading one of the SVN dumps onto a server here.
>
> MAN is it slow. Going maybe 2 revisions/sec.... so I should have an SVN
> replica in somewhere around a week to test with.
>
> @Infra: I don't suppose it's possible to get a writable snapshot mounted
> somehow? If I recall correctly, ZFS supports this and svn.apache.org runs
> ZFS?
>
>
> -Todd
>
> On Fri, Apr 29, 2011 at 1:16 PM, Nigel Daley <nd...@mac.com> wrote:
>
>> I can't do this at 2pm now.  Todd, I suspect you want more time to try out
>> the svn/git test anyways.
>>
>> Let's shoot for next Wednesday at 2pm.  Ian should be back by then too.
>>  Any objections?
>>
>> Cheers,
>> Nige
>>
>> On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote:
>>
>> >
>> > On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:
>> >
>> >> Wasn't sure how to go about doing that. I guess we need to talk to infra
>> about it? Do you know how we might clone the SVN repos themselves to test
>> with?
>> >
>> > It looks like there are svn dumps at http://svn-master.apache.org/dump/from 2 april 2011. You should be able to use those to setup a local
>> subversion.
>> >
>> > -- Owen
>> >
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Paul Davis <pa...@gmail.com>.

You might want to look around at the options for imports more. I know
when I was doing git-svn clones it was doing something stupid like 1
rev/s until I found a --log-window setting. Not sure if that's from
the git layer or the svn layer but it made orders of magnitude
difference.

On Fri, May 6, 2011 at 4:04 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hey folks,
>
> FYI I'm in the process of loading one of the SVN dumps onto a server here.
>
> MAN is it slow. Going maybe 2 revisions/sec.... so I should have an SVN
> replica in somewhere around a week to test with.
>
> @Infra: I don't suppose it's possible to get a writable snapshot mounted
> somehow? If I recall correctly, ZFS supports this and svn.apache.org runs
> ZFS?
>
>
> -Todd
>
> On Fri, Apr 29, 2011 at 1:16 PM, Nigel Daley <nd...@mac.com> wrote:
>
>> I can't do this at 2pm now.  Todd, I suspect you want more time to try out
>> the svn/git test anyways.
>>
>> Let's shoot for next Wednesday at 2pm.  Ian should be back by then too.
>>  Any objections?
>>
>> Cheers,
>> Nige
>>
>> On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote:
>>
>> >
>> > On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:
>> >
>> >> Wasn't sure how to go about doing that. I guess we need to talk to infra
>> about it? Do you know how we might clone the SVN repos themselves to test
>> with?
>> >
>> > It looks like there are svn dumps at http://svn-master.apache.org/dump/from 2 april 2011. You should be able to use those to setup a local
>> subversion.
>> >
>> > -- Owen
>> >
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Todd Lipcon <to...@cloudera.com>.

Hey folks,

FYI I'm in the process of loading one of the SVN dumps onto a server here.

MAN is it slow. Going maybe 2 revisions/sec.... so I should have an SVN
replica in somewhere around a week to test with.

@Infra: I don't suppose it's possible to get a writable snapshot mounted
somehow? If I recall correctly, ZFS supports this and svn.apache.org runs
ZFS?

-Todd

On Fri, Apr 29, 2011 at 1:16 PM, Nigel Daley <nd...@mac.com> wrote:

> I can't do this at 2pm now.  Todd, I suspect you want more time to try out
> the svn/git test anyways.
>
> Let's shoot for next Wednesday at 2pm.  Ian should be back by then too.
>  Any objections?
>
> Cheers,
> Nige
>
> On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote:
>
> >
> > On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:
> >
> >> Wasn't sure how to go about doing that. I guess we need to talk to infra
> about it? Do you know how we might clone the SVN repos themselves to test
> with?
> >
> > It looks like there are svn dumps at http://svn-master.apache.org/dump/from 2 april 2011. You should be able to use those to setup a local
> subversion.
> >
> > -- Owen
> >
>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Todd Lipcon <to...@cloudera.com>.

Hey folks,

FYI I'm in the process of loading one of the SVN dumps onto a server here.

MAN is it slow. Going maybe 2 revisions/sec.... so I should have an SVN
replica in somewhere around a week to test with.

@Infra: I don't suppose it's possible to get a writable snapshot mounted
somehow? If I recall correctly, ZFS supports this and svn.apache.org runs
ZFS?

-Todd

On Fri, Apr 29, 2011 at 1:16 PM, Nigel Daley <nd...@mac.com> wrote:

> I can't do this at 2pm now.  Todd, I suspect you want more time to try out
> the svn/git test anyways.
>
> Let's shoot for next Wednesday at 2pm.  Ian should be back by then too.
>  Any objections?
>
> Cheers,
> Nige
>
> On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote:
>
> >
> > On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:
> >
> >> Wasn't sure how to go about doing that. I guess we need to talk to infra
> about it? Do you know how we might clone the SVN repos themselves to test
> with?
> >
> > It looks like there are svn dumps at http://svn-master.apache.org/dump/from 2 april 2011. You should be able to use those to setup a local
> subversion.
> >
> > -- Owen
> >
>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

I can't do this at 2pm now.  Todd, I suspect you want more time to try out the svn/git test anyways. 

Let's shoot for next Wednesday at 2pm.  Ian should be back by then too.  Any objections?

Cheers,
Nige

On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote:

> 
> On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:
> 
>> Wasn't sure how to go about doing that. I guess we need to talk to infra about it? Do you know how we might clone the SVN repos themselves to test with?
> 
> It looks like there are svn dumps at http://svn-master.apache.org/dump/ from 2 april 2011. You should be able to use those to setup a local subversion.
> 
> -- Owen
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

I can't do this at 2pm now.  Todd, I suspect you want more time to try out the svn/git test anyways. 

Let's shoot for next Wednesday at 2pm.  Ian should be back by then too.  Any objections?

Cheers,
Nige

On Apr 29, 2011, at 11:36 AM, Owen O'Malley wrote:

> 
> On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:
> 
>> Wasn't sure how to go about doing that. I guess we need to talk to infra about it? Do you know how we might clone the SVN repos themselves to test with?
> 
> It looks like there are svn dumps at http://svn-master.apache.org/dump/ from 2 april 2011. You should be able to use those to setup a local subversion.
> 
> -- Owen
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:

> Wasn't sure how to go about doing that. I guess we need to talk to infra about it? Do you know how we might clone the SVN repos themselves to test with?

It looks like there are svn dumps at http://svn-master.apache.org/dump/ from 2 april 2011. You should be able to use those to setup a local subversion.

-- Owen

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Apr 28, 2011, at 11:24 PM, Todd Lipcon wrote:

> Wasn't sure how to go about doing that. I guess we need to talk to infra about it? Do you know how we might clone the SVN repos themselves to test with?

It looks like there are svn dumps at http://svn-master.apache.org/dump/ from 2 april 2011. You should be able to use those to setup a local subversion.

-- Owen

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Apr 28, 2011 at 10:06 PM, Nigel Daley <nd...@mac.com> wrote:

> As announced last week, I'm planning to do this at 2pm PDT tomorrow
> (Friday) April 29.
>
> Suresh, when do you plan to commit HFS-1052?  That should be done first.
>
> Owen or Todd, did you want to follow Paul's advice:
> > If you're really wanting to make sure to keep the history in Git
> > intact my suggestion would be to setup a temporary svn server locally
> > and test our mirroring scripts against the commands you intend to run.
> If so, how much more time do you need?
>

Wasn't sure how to go about doing that. I guess we need to talk to infra
about it? Do you know how we might clone the SVN repos themselves to test
with?

-Todd

On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote:
>
> > Owen, I'll admit I'm not familiar with all the git details/issues in your
> proposal, but I think the layout change you propose is fine and seems to
> solve the git issues with very minimal impact on the layout.
> >
> > Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update
> the patch and send out a reminder about this later next week.
> >
> > Thanks,
> > Nige
> >
> > On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:
> >
> >>
> >> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
> >>
> >>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com>
> wrote:
> >>>
> >>>>
> >>>> I'm currently looking into how the git mirrors are setup in
> Apache-land.
> >>
> >> Uh, why isn't infra-dev on this thread?
> >>
> >> For those on infra-dev, the context is that Nigel is trying to merge
> together the source trees of the Hadoop sub-projects that were split apart 2
> years ago. So he is taking:
> >>
> >> prefix = http://svn.apache.org/repos/asf/hadoop/
> >>
> >> $prefix/common/trunk -> $prefix/trunk/common
> >> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
> >> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
> >>
> >> and play similar games with the rest of the branches and tags. For more
> details look at HADOOP-7106.
> >>
> >> From the project split, subversion was able to track the history across
> the subversion moves between projects, but not git.
> >>
> >> Four questions:
> >> 1. Is there anything we can do to minimize the history loss in git?
> >> 2. Are we going to be able to preserve our sha's or are they going to
> change again?
> >> 3. What changes do we need to make to the subversion notification file?
> >> 4. Are there any other changes that need to be coordinated?
> >>
> >> After considering it this morning, I believe that the least disruptive
> move is to leave common at the same url and merge hdfs and mapreduce back
> in:
> >>
> >> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
> >> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
> >> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
> >>
> >> This will preserve the hashes and history for common (and the 20
> branches). We'll still need to play git voodoo to get git history for hdfs
> and mapreduce, but it is far better than starting a brand new git clone.
> >>
> >> -- Owen
> >>
> >>
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Apr 28, 2011 at 10:06 PM, Nigel Daley <nd...@mac.com> wrote:

> As announced last week, I'm planning to do this at 2pm PDT tomorrow
> (Friday) April 29.
>
> Suresh, when do you plan to commit HFS-1052?  That should be done first.
>
> Owen or Todd, did you want to follow Paul's advice:
> > If you're really wanting to make sure to keep the history in Git
> > intact my suggestion would be to setup a temporary svn server locally
> > and test our mirroring scripts against the commands you intend to run.
> If so, how much more time do you need?
>

Wasn't sure how to go about doing that. I guess we need to talk to infra
about it? Do you know how we might clone the SVN repos themselves to test
with?

-Todd

On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote:
>
> > Owen, I'll admit I'm not familiar with all the git details/issues in your
> proposal, but I think the layout change you propose is fine and seems to
> solve the git issues with very minimal impact on the layout.
> >
> > Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update
> the patch and send out a reminder about this later next week.
> >
> > Thanks,
> > Nige
> >
> > On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:
> >
> >>
> >> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
> >>
> >>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com>
> wrote:
> >>>
> >>>>
> >>>> I'm currently looking into how the git mirrors are setup in
> Apache-land.
> >>
> >> Uh, why isn't infra-dev on this thread?
> >>
> >> For those on infra-dev, the context is that Nigel is trying to merge
> together the source trees of the Hadoop sub-projects that were split apart 2
> years ago. So he is taking:
> >>
> >> prefix = http://svn.apache.org/repos/asf/hadoop/
> >>
> >> $prefix/common/trunk -> $prefix/trunk/common
> >> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
> >> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
> >>
> >> and play similar games with the rest of the branches and tags. For more
> details look at HADOOP-7106.
> >>
> >> From the project split, subversion was able to track the history across
> the subversion moves between projects, but not git.
> >>
> >> Four questions:
> >> 1. Is there anything we can do to minimize the history loss in git?
> >> 2. Are we going to be able to preserve our sha's or are they going to
> change again?
> >> 3. What changes do we need to make to the subversion notification file?
> >> 4. Are there any other changes that need to be coordinated?
> >>
> >> After considering it this morning, I believe that the least disruptive
> move is to leave common at the same url and merge hdfs and mapreduce back
> in:
> >>
> >> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
> >> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
> >> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
> >>
> >> This will preserve the hashes and history for common (and the 20
> branches). We'll still need to play git voodoo to get git history for hdfs
> and mapreduce, but it is far better than starting a brand new git clone.
> >>
> >> -- Owen
> >>
> >>
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Suresh Srinivas <su...@yahoo-inc.com>.

Nigel,

I have committed federation merge into trunk. Thank you for waiting for it
to be done.

Regards,
Suresh


On 4/28/11 10:06 PM, "Nigel Daley" <nd...@mac.com> wrote:

> As announced last week, I'm planning to do this at 2pm PDT tomorrow (Friday)
> April 29.  
> 
> Suresh, when do you plan to commit HFS-1052?  That should be done first.
> 
> Owen or Todd, did you want to follow Paul's advice:
>> If you're really wanting to make sure to keep the history in Git
>> intact my suggestion would be to setup a temporary svn server locally
>> and test our mirroring scripts against the commands you intend to run.
> If so, how much more time do you need?
> 
> Cheers,
> Nige
> 
> On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote:
> 
>> Owen, I'll admit I'm not familiar with all the git details/issues in your
>> proposal, but I think the layout change you propose is fine and seems to
>> solve the git issues with very minimal impact on the layout.
>> 
>> Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update the
>> patch and send out a reminder about this later next week.
>> 
>> Thanks,
>> Nige
>> 
>> On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:
>> 
>>> 
>>> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
>>> 
>>>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>>> 
>>>>> 
>>>>> I'm currently looking into how the git mirrors are setup in Apache-land.
>>> 
>>> Uh, why isn't infra-dev on this thread?
>>> 
>>> For those on infra-dev, the context is that Nigel is trying to merge
>>> together the source trees of the Hadoop sub-projects that were split apart 2
>>> years ago. So he is taking:
>>> 
>>> prefix = http://svn.apache.org/repos/asf/hadoop/
>>> 
>>> $prefix/common/trunk -> $prefix/trunk/common
>>> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
>>> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
>>> 
>>> and play similar games with the rest of the branches and tags. For more
>>> details look at HADOOP-7106.
>>> 
>>> From the project split, subversion was able to track the history across the
>>> subversion moves between projects, but not git.
>>> 
>>> Four questions:
>>> 1. Is there anything we can do to minimize the history loss in git?
>>> 2. Are we going to be able to preserve our sha's or are they going to change
>>> again?
>>> 3. What changes do we need to make to the subversion notification file?
>>> 4. Are there any other changes that need to be coordinated?
>>> 
>>> After considering it this morning, I believe that the least disruptive move
>>> is to leave common at the same url and merge hdfs and mapreduce back in:
>>> 
>>> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
>>> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
>>> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
>>> 
>>> This will preserve the hashes and history for common (and the 20 branches).
>>> We'll still need to play git voodoo to get git history for hdfs and
>>> mapreduce, but it is far better than starting a brand new git clone.
>>> 
>>> -- Owen
>>> 
>>> 
>> 
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Suresh Srinivas <su...@yahoo-inc.com>.

Nigel,

I have committed federation merge into trunk. Thank you for waiting for it
to be done.

Regards,
Suresh


On 4/28/11 10:06 PM, "Nigel Daley" <nd...@mac.com> wrote:

> As announced last week, I'm planning to do this at 2pm PDT tomorrow (Friday)
> April 29.  
> 
> Suresh, when do you plan to commit HFS-1052?  That should be done first.
> 
> Owen or Todd, did you want to follow Paul's advice:
>> If you're really wanting to make sure to keep the history in Git
>> intact my suggestion would be to setup a temporary svn server locally
>> and test our mirroring scripts against the commands you intend to run.
> If so, how much more time do you need?
> 
> Cheers,
> Nige
> 
> On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote:
> 
>> Owen, I'll admit I'm not familiar with all the git details/issues in your
>> proposal, but I think the layout change you propose is fine and seems to
>> solve the git issues with very minimal impact on the layout.
>> 
>> Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update the
>> patch and send out a reminder about this later next week.
>> 
>> Thanks,
>> Nige
>> 
>> On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:
>> 
>>> 
>>> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
>>> 
>>>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>>> 
>>>>> 
>>>>> I'm currently looking into how the git mirrors are setup in Apache-land.
>>> 
>>> Uh, why isn't infra-dev on this thread?
>>> 
>>> For those on infra-dev, the context is that Nigel is trying to merge
>>> together the source trees of the Hadoop sub-projects that were split apart 2
>>> years ago. So he is taking:
>>> 
>>> prefix = http://svn.apache.org/repos/asf/hadoop/
>>> 
>>> $prefix/common/trunk -> $prefix/trunk/common
>>> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
>>> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
>>> 
>>> and play similar games with the rest of the branches and tags. For more
>>> details look at HADOOP-7106.
>>> 
>>> From the project split, subversion was able to track the history across the
>>> subversion moves between projects, but not git.
>>> 
>>> Four questions:
>>> 1. Is there anything we can do to minimize the history loss in git?
>>> 2. Are we going to be able to preserve our sha's or are they going to change
>>> again?
>>> 3. What changes do we need to make to the subversion notification file?
>>> 4. Are there any other changes that need to be coordinated?
>>> 
>>> After considering it this morning, I believe that the least disruptive move
>>> is to leave common at the same url and merge hdfs and mapreduce back in:
>>> 
>>> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
>>> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
>>> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
>>> 
>>> This will preserve the hashes and history for common (and the 20 branches).
>>> We'll still need to play git voodoo to get git history for hdfs and
>>> mapreduce, but it is far better than starting a brand new git clone.
>>> 
>>> -- Owen
>>> 
>>> 
>> 
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

As announced last week, I'm planning to do this at 2pm PDT tomorrow (Friday) April 29.  

Suresh, when do you plan to commit HFS-1052?  That should be done first.

Owen or Todd, did you want to follow Paul's advice:
> If you're really wanting to make sure to keep the history in Git
> intact my suggestion would be to setup a temporary svn server locally
> and test our mirroring scripts against the commands you intend to run.
If so, how much more time do you need?

Cheers,
Nige

On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote:

> Owen, I'll admit I'm not familiar with all the git details/issues in your proposal, but I think the layout change you propose is fine and seems to solve the git issues with very minimal impact on the layout.
> 
> Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update the patch and send out a reminder about this later next week.
> 
> Thanks,
> Nige
> 
> On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:
> 
>> 
>> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
>> 
>>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>> 
>>>> 
>>>> I'm currently looking into how the git mirrors are setup in Apache-land.
>> 
>> Uh, why isn't infra-dev on this thread?
>> 
>> For those on infra-dev, the context is that Nigel is trying to merge together the source trees of the Hadoop sub-projects that were split apart 2 years ago. So he is taking:
>> 
>> prefix = http://svn.apache.org/repos/asf/hadoop/
>> 
>> $prefix/common/trunk -> $prefix/trunk/common
>> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
>> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
>> 
>> and play similar games with the rest of the branches and tags. For more details look at HADOOP-7106.
>> 
>> From the project split, subversion was able to track the history across the subversion moves between projects, but not git.
>> 
>> Four questions:
>> 1. Is there anything we can do to minimize the history loss in git?
>> 2. Are we going to be able to preserve our sha's or are they going to change again?
>> 3. What changes do we need to make to the subversion notification file?
>> 4. Are there any other changes that need to be coordinated?
>> 
>> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
>> 
>> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
>> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
>> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
>> 
>> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
>> 
>> -- Owen
>> 
>> 
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

As announced last week, I'm planning to do this at 2pm PDT tomorrow (Friday) April 29.  

Suresh, when do you plan to commit HFS-1052?  That should be done first.

Owen or Todd, did you want to follow Paul's advice:
> If you're really wanting to make sure to keep the history in Git
> intact my suggestion would be to setup a temporary svn server locally
> and test our mirroring scripts against the commands you intend to run.
If so, how much more time do you need?

Cheers,
Nige

On Apr 20, 2011, at 9:42 PM, Nigel Daley wrote:

> Owen, I'll admit I'm not familiar with all the git details/issues in your proposal, but I think the layout change you propose is fine and seems to solve the git issues with very minimal impact on the layout.
> 
> Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update the patch and send out a reminder about this later next week.
> 
> Thanks,
> Nige
> 
> On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:
> 
>> 
>> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
>> 
>>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>> 
>>>> 
>>>> I'm currently looking into how the git mirrors are setup in Apache-land.
>> 
>> Uh, why isn't infra-dev on this thread?
>> 
>> For those on infra-dev, the context is that Nigel is trying to merge together the source trees of the Hadoop sub-projects that were split apart 2 years ago. So he is taking:
>> 
>> prefix = http://svn.apache.org/repos/asf/hadoop/
>> 
>> $prefix/common/trunk -> $prefix/trunk/common
>> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
>> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
>> 
>> and play similar games with the rest of the branches and tags. For more details look at HADOOP-7106.
>> 
>> From the project split, subversion was able to track the history across the subversion moves between projects, but not git.
>> 
>> Four questions:
>> 1. Is there anything we can do to minimize the history loss in git?
>> 2. Are we going to be able to preserve our sha's or are they going to change again?
>> 3. What changes do we need to make to the subversion notification file?
>> 4. Are there any other changes that need to be coordinated?
>> 
>> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
>> 
>> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
>> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
>> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
>> 
>> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
>> 
>> -- Owen
>> 
>> 
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

Owen, I'll admit I'm not familiar with all the git details/issues in your proposal, but I think the layout change you propose is fine and seems to solve the git issues with very minimal impact on the layout.

Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update the patch and send out a reminder about this later next week.

Thanks,
Nige

On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:

> 
> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
> 
>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> 
>>> 
>>> I'm currently looking into how the git mirrors are setup in Apache-land.
> 
> Uh, why isn't infra-dev on this thread?
> 
> For those on infra-dev, the context is that Nigel is trying to merge together the source trees of the Hadoop sub-projects that were split apart 2 years ago. So he is taking:
> 
> prefix = http://svn.apache.org/repos/asf/hadoop/
> 
> $prefix/common/trunk -> $prefix/trunk/common
> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
> 
> and play similar games with the rest of the branches and tags. For more details look at HADOOP-7106.
> 
> From the project split, subversion was able to track the history across the subversion moves between projects, but not git.
> 
> Four questions:
> 1. Is there anything we can do to minimize the history loss in git?
> 2. Are we going to be able to preserve our sha's or are they going to change again?
> 3. What changes do we need to make to the subversion notification file?
> 4. Are there any other changes that need to be coordinated?
> 
> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
> 
> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
> 
> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
> 
> -- Owen
> 
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Konstantin Boudnik <co...@boudnik.org>.

On Wed, Apr 20, 2011 at 08:00, Owen O'Malley <om...@apache.org> wrote:
> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
>
> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce

This seems like adding an insult to injury by creating an artificial
SVN layout and moving what deemed to be an unrelated components into
common's namespace.

The original proposal seems much more straight forward.

Cos

> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
>
> -- Owen

>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Paul Davis <pa...@gmail.com>.

> From the project split, subversion was able to track the history across the subversion moves between projects, but not git.
>
> Four questions:
>  1. Is there anything we can do to minimize the history loss in git?

Glancing at the Git history on GitHub for Hadoop HDFS my guess is that
the mirror script is just pulling from $prefix/hdfs. I'm not entirely
certain how promiscuous it'll look for movements from elsewhere in the
svn repo for history outside the root URL.

If you're really wanting to make sure to keep the history in Git
intact my suggestion would be to setup a temporary svn server locally
and test our mirroring scripts against the commands you intend to run.

Paul Davis

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

Owen, I'll admit I'm not familiar with all the git details/issues in your proposal, but I think the layout change you propose is fine and seems to solve the git issues with very minimal impact on the layout.

Let's shoot for doing this next Friday, April 29 at 2pm PDT.  I'll update the patch and send out a reminder about this later next week.

Thanks,
Nige

On Apr 20, 2011, at 8:00 AM, Owen O'Malley wrote:

> 
> On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:
> 
>> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> 
>>> 
>>> I'm currently looking into how the git mirrors are setup in Apache-land.
> 
> Uh, why isn't infra-dev on this thread?
> 
> For those on infra-dev, the context is that Nigel is trying to merge together the source trees of the Hadoop sub-projects that were split apart 2 years ago. So he is taking:
> 
> prefix = http://svn.apache.org/repos/asf/hadoop/
> 
> $prefix/common/trunk -> $prefix/trunk/common
> $prefix/hdfs/trunk -> $prefix/trunk/hdfs
> $prefix/mapreduce/trunk -> $prefix/trunk/mapreduce
> 
> and play similar games with the rest of the branches and tags. For more details look at HADOOP-7106.
> 
> From the project split, subversion was able to track the history across the subversion moves between projects, but not git.
> 
> Four questions:
> 1. Is there anything we can do to minimize the history loss in git?
> 2. Are we going to be able to preserve our sha's or are they going to change again?
> 3. What changes do we need to make to the subversion notification file?
> 4. Are there any other changes that need to be coordinated?
> 
> After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:
> 
> $prefix/common/trunk/* -> $prefix/common/trunk/common/*
> $prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
> $prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce
> 
> This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.
> 
> -- Owen
> 
>

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Owen O'Malley <om...@apache.org>.

On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:

> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
> 
>> 
>> I'm currently looking into how the git mirrors are setup in Apache-land.

Uh, why isn't infra-dev on this thread?

For those on infra-dev, the context is that Nigel is trying to merge together the source trees of the Hadoop sub-projects that were split apart 2 years ago. So he is taking:

prefix = http://svn.apache.org/repos/asf/hadoop/

$prefix/common/trunk -> $prefix/trunk/common
$prefix/hdfs/trunk -> $prefix/trunk/hdfs
$prefix/mapreduce/trunk -> $prefix/trunk/mapreduce

and play similar games with the rest of the branches and tags. For more details look at HADOOP-7106.

From the project split, subversion was able to track the history across the subversion moves between projects, but not git.

Four questions:
 1. Is there anything we can do to minimize the history loss in git?
 2. Are we going to be able to preserve our sha's or are they going to change again?
 3. What changes do we need to make to the subversion notification file?
 4. Are there any other changes that need to be coordinated?

After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:

$prefix/common/trunk/* -> $prefix/common/trunk/common/*
$prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
$prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce

This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.

-- Owen

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Owen O'Malley <om...@apache.org>.

On Apr 19, 2011, at 10:58 PM, Todd Lipcon wrote:

> On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:
> 
>> 
>> I'm currently looking into how the git mirrors are setup in Apache-land.

Uh, why isn't infra-dev on this thread?

For those on infra-dev, the context is that Nigel is trying to merge together the source trees of the Hadoop sub-projects that were split apart 2 years ago. So he is taking:

prefix = http://svn.apache.org/repos/asf/hadoop/

$prefix/common/trunk -> $prefix/trunk/common
$prefix/hdfs/trunk -> $prefix/trunk/hdfs
$prefix/mapreduce/trunk -> $prefix/trunk/mapreduce

and play similar games with the rest of the branches and tags. For more details look at HADOOP-7106.

From the project split, subversion was able to track the history across the subversion moves between projects, but not git.

Four questions:
 1. Is there anything we can do to minimize the history loss in git?
 2. Are we going to be able to preserve our sha's or are they going to change again?
 3. What changes do we need to make to the subversion notification file?
 4. Are there any other changes that need to be coordinated?

After considering it this morning, I believe that the least disruptive move is to leave common at the same url and merge hdfs and mapreduce back in:

$prefix/common/trunk/* -> $prefix/common/trunk/common/*
$prefix/hdfs/trunk -> $prefix/common/trunk/hdfs
$prefix/mapreduce/trunk -> $prefix/common/trunk/mapreduce

This will preserve the hashes and history for common (and the 20 branches). We'll still need to play git voodoo to get git history for hdfs and mapreduce, but it is far better than starting a brand new git clone.

-- Owen

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Todd Lipcon <to...@cloudera.com>.

On Tue, Apr 19, 2011 at 10:20 PM, Todd Lipcon <to...@cloudera.com> wrote:

>
> I'm currently looking into how the git mirrors are setup in Apache-land.
>

Git-wise, I think we have two options:

Option 1)
- Create a new git mirror for the new hadoop/ tree. This will have no
history.
- On the Apache side, fetch the split-project git mirrors into the combined
git mirror as branches - eg hadoop-hdfs.git:trunk becomes a branch named
something like pre-HADOOP-7106/hdfs/trunk. Thus, when any user fetches,
he'll get all the git objects from "prehistory" as well without having to
add separate remotes.
- Add a script or README file explaining how to set up git grafts on the
combined hadoop.git so that the new combination branch "foo" looks like a
merge of pre-HADOOP-7106/{hdfs,common,mapred}/foo. Since git grafts are
local constructs, each git user would have to run this script once after
checking out the git tree, after which the history would be "healed"

Pros:
 - all existing sha1s stay the same.
 - Any local branches people might have for works in progress should
continue to refer to proper SHA1s and should rebase relatively easily onto
the combined trunk
 - Should be reasonably simple to implement

Cons:
 - users have to run a script upon checkout in order to graft back together
history

Option 2)
- Use git-filter-branch on the split repos to rewrite them as if they always
took place in their new subdirectories.
- Fetch these repos into the merged repo
- Set up grafts in the merged repo
- Run git-filter-branch --all in the merged repo, which will make the grafts
permanent
- May have to run git-filter-branch to rewrite some of the git-svn-info:
commit messages to trick git-svn.

This option basically rewrites history so that it looks like the original
project split did what we're planning to do now.

Pros:
 - we have a single cohesive git repo with no need to have users set up
grafts

Cons:
 - all of our SHA1s between the original split and now would change (making
it harder to rebase local branches for example)
 - way more opportunity for error, I think.

I'm leaning towards option 1 above, and happy to write the script which
installs the grafts into the user's local repo.

-Todd


>
>> On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:
>>
>> All,
>>
>> As discussed in Jan/Feb, I'd like to coordinate a date for committing the
>> re-organization of our svn layout:
>> https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday
>> April 21 at 11am PDT.
>>
>> - I will send out reminders leading up to that date.
>> - I will announce on IRC when I'm about to start the changes.
>> - I will run the script to make the changes.
>> - Ian, can you update the asf-authorization-template file and the
>> asf-mailer.conf files at the same time?
>> - Owen/Todd/Jukka, can you make sure that actions needed by git users are
>> taken care of at the same time? (what are these?)
>>
>> More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit
>>
>> Cheers,
>> Nige
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Todd Lipcon <to...@cloudera.com>.

On Tue, Apr 19, 2011 at 10:02 PM, Nigel Daley <nd...@mac.com> wrote:

> I'm still planning to make this SVN change on Thursday this week.
>
> Ian, Owen, Todd, note the questions I ask you below.  Can you help with
> these on Thursday?
>

Unfortunately I'm out of the office most of the day on Thursday with a
customer. I'll be available Thursday evening, though, to help with any
cleanup/etc.

I'm currently looking into how the git mirrors are setup in Apache-land.

My guess is that there will be some disturbance to developers on Thurs
afternoon / Friday as this gets sorted out, even if we try to plan as much
as possible. Would it be better to do this on Friday so that we have the
weekend to fix up broken pieces before people get to work on Monday?

-Todd

> On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:
>
> All,
>
> As discussed in Jan/Feb, I'd like to coordinate a date for committing the
> re-organization of our svn layout:
> https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday
> April 21 at 11am PDT.
>
> - I will send out reminders leading up to that date.
> - I will announce on IRC when I'm about to start the changes.
> - I will run the script to make the changes.
> - Ian, can you update the asf-authorization-template file and the
> asf-mailer.conf files at the same time?
> - Owen/Todd/Jukka, can you make sure that actions needed by git users are
> taken care of at the same time? (what are these?)
>
> More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit
>
> Cheers,
> Nige
>
>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HADOOP-7106: Re-organize hadoop subversion layout

Posted by Nigel Daley <nd...@mac.com>.

I'm still planning to make this SVN change on Thursday this week.  

Ian, Owen, Todd, note the questions I ask you below.  Can you help with these on Thursday?

Thanks,
Nige

On Apr 9, 2011, at 11:09 PM, Nigel Daley wrote:

> All, 
> 
> As discussed in Jan/Feb, I'd like to coordinate a date for committing the re-organization of our svn layout: https://issues.apache.org/jira/browse/HADOOP-7106.  I propose Thursday April 21 at 11am PDT.
> 
> - I will send out reminders leading up to that date.
> - I will announce on IRC when I'm about to start the changes.
> - I will run the script to make the changes.
> - Ian, can you update the asf-authorization-template file and the asf-mailer.conf files at the same time?
> - Owen/Todd/Jukka, can you make sure that actions needed by git users are taken care of at the same time? (what are these?)
> 
> More info on this change is at http://wiki.apache.org/hadoop/ProjectSplit
> 
> Cheers,
> Nige