You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@climate.apache.org by Michael Joyce <jo...@apache.org> on 2014/03/10 18:06:40 UTC

Our new repository is a bit bloated

Hi guys,

An unfortunate side effect of our export from SVN to Git is that we've
ended up with a rather bloated repository. We've had a large number of
binary files in our repo in the past and all of this has been rolled up
into a obnoxious ~500 MB pack file. I've been completely unable to clone
the repo on my home internet because it constantly times out and it's
painfully slow on my faster work connection.

To fix this problem I suggest we do the following:
- Remove all binary files from our repo and host them externally. For
example, NetCDF files can be downloaded when they're needed and cleaned up
afterwards (for tests or examples).
- Remove all the bloat from our pack file. I was digging through stuff
earlier and found a number of very large and outdated files in our pack
file (~300 MB NC file, internal JPL presentations/files from a long time
ago, etc.). We should be able to use [1] to help automate this for us,
although we can also take care of it on our own if need be.

Let me know what you guys think the best course of action is. That being
said, dealing with this sooner rather than later would be nice =D

[1] https://github.com/cmaitchison/git_diet

-- Joyce

Re: Our new repository is a bit bloated

Posted by Michael Joyce <jo...@apache.org>.
All,

I'm going to push these changes now. Checkout CLIMATE-384 for updates


-- Joyce


On Fri, Mar 14, 2014 at 2:59 PM, Cameron Goodale <go...@apache.org> wrote:

> Mike,
>
> I just cloned your repo and all of the tests in 'ocw' passed on my machine.
>
> Cheers,
>
> Cameron
> On Mar 14, 2014 12:25 PM, "Michael Joyce" <jo...@apache.org> wrote:
>
> > Update on this. I've tested this on a fork to make sure that it works
> > properly. Everything seems fine. I've taken the old repository and
> removed
> > all the old large binaries and leftover JPL related things. I've gone
> > through and run some tests, including a sample evaluation through the UI
> > and the backend, and everything seems kosher.
> >
> > The repo is down to ~13 MB now.
> >
> > It would be a big help if everyone could go pull down the new code [1]
> and
> > make sure everything looks alright. I'll wait a bit before pushing any
> > changes to the ASF so everyone has adequate time to test.
> >
> > [1] https://github.com/MJJoyce/climate
> >
> >
> >
> >
> > -- Joyce
> >
> >
> > On Wed, Mar 12, 2014 at 2:49 PM, Michael Joyce <jo...@apache.org> wrote:
> >
> > > Awesome, I'm glad we got this sorted out. Thanks for all the hard work!
> > >
> > >
> > > -- Joyce
> > >
> > >
> > > On Wed, Mar 12, 2014 at 2:45 PM, denis.nadeau <denis.nadeau@nasa.gov
> > >wrote:
> > >
> > >> Happy to be your project guinea pig ! :->
> > >>
> > >> After recompiling "git" with "libcurl" (./configure --with-curl) I was
> > >> able to push the changes.  You need curl to get access to https (seems
> > >> like).
> > >> Your git remote command was very useful.  I did not have to clone the
> > >> repo and copy my files over and redo the "git commands".
> > >>
> > >> I made quite some changes to obs4MIPs and need to "diff" and push
> > changes
> > >> to the repository.    So it seems that, I am good to go with "git".
> :-)
> > >>
> > >> Great work and thanks for your help!
> > >> Denis
> > >>
> > >>
> > >> On 3/12/14 5:01 PM, Michael Joyce wrote:
> > >>
> > >>> Ah good, we're getting close! We'd be even closer if I hadn't messed
> up
> > >>> in
> > >>> a previous git related email!
> > >>>
> > >>> Our git://git.apache.org/climate.git mirror is our read only git
> > mirror.
> > >>> That would explain why you aren't able to write to it.
> > >>>
> > >>> We need to use:
> > >>> https://git-wip-us.apache.org/repos/asf/climate.git
> > >>>
> > >>> If we didn't have a commit bit we would instead use (http vs https)
> > >>> http://git-wip-us.apache.org/repos/asf/reponame.git
> > >>>
> > >>> I misread some documentation at [1] and [2] and confused myself. I
> > >>> thought
> > >>> the "WIP" or "Work in Progress" label was for migration only. Silly
> me.
> > >>>
> > >>> We can fix this fairly easily by running
> > >>> $ git remote set-url origin
> > >>> https://git-wip-us.apache.org/repos/asf/climate.gi
> > >>>
> > >>> Then, you should see updated URLs with
> > >>> $ git remote -v
> > >>>
> > >>> At that point you should be able to push successfully.
> > >>>
> > >>> Sorry that you've turned into our project guinea pig Denis! I had
> hoped
> > >>> to
> > >>> smooth out some of these rough edges this last weekend/early this
> week
> > >>> but
> > >>> unfortunately I haven't been able to do so. We'll get there though!
> > >>>
> > >>> [1] https://www.apache.org/dev/writable-git
> > >>> [2] https://git-wip-us.apache.org/
> > >>>
> > >>>
> > >>> -- Joyce
> > >>>
> > >>>
> > >>> On Wed, Mar 12, 2014 at 12:44 PM, denis.nadeau <
> denis.nadeau@nasa.gov
> > >>> >wrote:
> > >>>
> > >>>  Joyce,
> > >>>>
> > >>>> This is great introduction and will help other SVN/CVS developers.
> > (I
> > >>>> did not know you had to "git add" every changes.)
> > >>>>
> > >>>> Right now, I just can't push to github.  I think it might be a
> > >>>> configuration issue.  Do you need my ssh keys or something for me to
> > >>>> 'push'?
> > >>>>
> > >>>>     git push origin master
> > >>>>
> > >>>>         fatal: The remote end hung up unexpectedly
> > >>>>
> > >>>>
> > >>>>     git status
> > >>>>
> > >>>>         # On branch master
> > >>>>         # Your branch is ahead of 'origin/master' by 4 commits.
> > >>>>         #
> > >>>>         nothing to commit (working directory clean)
> > >>>>
> > >>>>     git remote -v
> > >>>>
> > >>>>         origin  git://git.apache.org/climate.git (fetch)
> > >>>>         origin  git://git.apache.org/climate.git (push)
> > >>>>
> > >>>> Thanks for your help. (almost there...)
> > >>>> Denis
> > >>>>
> > >>>>
> > >>>> On 3/12/14 12:37 PM, Michael Joyce wrote:
> > >>>>
> > >>>>  Ah, let me explain since git is just a bit different from SVN.
> > >>>>>
> > >>>>> When you commit in git you aren't actually committing to the
> primary
> > >>>>> server
> > >>>>> like you are in SVN. You're committing to your local working copy.
> In
> > >>>>> order
> > >>>>> to mirror those changes to the ASF you will need to run "git push".
> > So
> > >>>>> "git
> > >>>>> status" is telling you that you've committed 4 times and those
> > changes
> > >>>>> aren't mirrored on the server that you ran "git clone" from.
> > >>>>>
> > >>>>> To be safe, you might want to checkout a clean copy of the repo
> from
> > >>>>> the
> > >>>>> ASF (which should only take forever =) and then try again. We could
> > go
> > >>>>> through each of the commits and make sure they're the way you want
> > >>>>> them to
> > >>>>> be, but that might end up being more trouble than it's worth if we
> > try
> > >>>>> to
> > >>>>> do it via email. This is the workflow that I would probably follow:
> > >>>>>
> > >>>>> # Remove the files that you don't want anymore. I'm going to say
> that
> > >>>>> we're
> > >>>>> # sitting in the root of our repo and the files are in
> > >>>>> '/obs4MIPs/examples'
> > >>>>> $ git rm -r obs4MIPs/examples
> > >>>>> $ git status
> > >>>>> # You should now see a number of files being marked as "staged for
> > >>>>> commit".
> > >>>>> # Go ahead commit these removals
> > >>>>> $ git commit -m "Removing obs4MIPs example .nc files"
> > >>>>>
> > >>>>> # Now if you run git status you shouldn't see any files listed, but
> > it
> > >>>>> will
> > >>>>> say
> > >>>>> # that you're ahead of origin/master by 1 commit
> > >>>>>
> > >>>>> # Now add the readme/or update any other files
> > >>>>>
> > >>>>> $ git add .
> > >>>>> # It's important to note that "add" in git is not the same as "add"
> > in
> > >>>>> svn.
> > >>>>> Add in git means
> > >>>>> # "add/stage these changes for the next commit". If you're used to
> > svn
> > >>>>> this
> > >>>>> can be a bit
> > >>>>> # confusing. In git you need to add changes every time you want to
> > >>>>> commit,
> > >>>>> as opposed
> > >>>>> # to svn where you only "add" the file to the repo once.
> > >>>>>
> > >>>>> $ git status
> > >>>>> # You should see all the files that you changed present and "staged
> > for
> > >>>>> commit". When
> > >>>>> # something is "staged for commit" that means that it will be
> > committed
> > >>>>> next time we
> > >>>>> # run git commit.
> > >>>>> $ git commit -m "Update blah blah blah"
> > >>>>>
> > >>>>> # Now you should see that you're ahead by a few commits depending
> on
> > >>>>> how
> > >>>>> many times you've committed.
> > >>>>> # At this point you probably want to share all your changes with
> > >>>>> everyone,
> > >>>>> so we'll push the changes up to the server.
> > >>>>>
> > >>>>> # You really can abbreviate this to just 'git push' or 'git push
> > >>>>> origin'.
> > >>>>> We're going to play it safe and be super explicit.
> > >>>>> # This is telling git to push all the changes that you've committed
> > in
> > >>>>> your
> > >>>>> 'master' branch
> > >>>>> # (which is the default one that you've been working in) to the
> > remote
> > >>>>> named "origin". By default,
> > >>>>> # the repo that you cloned from is named "origin".
> > >>>>> $ git push origin master
> > >>>>>
> > >>>>> At this point we should get some emails saying you committed.
> > >>>>>
> > >>>>> Hopefully that helps a bit. If you have more questions let me know.
> > It
> > >>>>> can
> > >>>>> certainly be a bit jarring of a change moving to git from svn. I'm
> > >>>>> working
> > >>>>> on writing up a brief "intro to git" that I will send around to the
> > >>>>> mailing
> > >>>>> list once it's in a useful state. It should hopefully help clear up
> > >>>>> some
> > >>>>> confusion for everyone.
> > >>>>>
> > >>>>>
> > >>>>> -- Joyce
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <
> denis.nadeau@nasa.gov
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>   Joyce,
> > >>>>>
> > >>>>>> I did commit the change and also remove "rm" the .nc files.  I did
> > not
> > >>>>>> see
> > >>>>>> an email either.   Here are the 3 commands I used
> > >>>>>>
> > >>>>>> 1. git add
> > >>>>>> 2. git commit
> > >>>>>> 3. git rm
> > >>>>>>
> > >>>>>>
> > >>>>>> I guess "git rm" does not need a commit command.
> > >>>>>> When I run "git status" I get this message. I am not sure what
> > "ahead
> > >>>>>> of
> > >>>>>> 'origin/master' by 4 commits" means!
> > >>>>>>
> > >>>>>>      git status
> > >>>>>>      # On branch master
> > >>>>>>      # Your branch is ahead of 'origin/master' by 4 commits.
> > >>>>>>      #
> > >>>>>>      nothing to commit (working directory clean)
> > >>>>>>
> > >>>>>>
> > >>>>>> Denis
> > >>>>>>
> > >>>>>> On 3/12/14 11:19 AM, Michael Joyce wrote:
> > >>>>>>
> > >>>>>>   Awesome Denis thanks much. I will play around with this more
> soon
> > >>>>>> and
> > >>>>>>
> > >>>>>>> see
> > >>>>>>> if I can't strip out some more files. Did you push your changes
> up
> > to
> > >>>>>>> the
> > >>>>>>> repo? I didn't see a commit email come through, but I'm not
> certain
> > >>>>>>> my
> > >>>>>>> filters are working correctly with the mailing list migrations.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> -- Joyce
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <
> > denis.nadeau@nasa.gov
> > >>>>>>> >
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>    Joyce,
> > >>>>>>>
> > >>>>>>>  I deleted the .nc files found in my example directory for TRMM
> and
> > >>>>>>>> ECMWF.
> > >>>>>>>>     I have installed a README file and explain users how to
> > >>>>>>>> retrieve the
> > >>>>>>>> data
> > >>>>>>>> from the original data provider.    TRMM is pretty
> > straightforward,
> > >>>>>>>> but
> > >>>>>>>> for
> > >>>>>>>> ECMWF you need to register, obtain a key and download their
> Python
> > >>>>>>>> package.
> > >>>>>>>>
> > >>>>>>>> It works pretty well on my machine, let see what users say.
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Denis
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
> > >>>>>>>>
> > >>>>>>>>    I think that would be great Denis! I can go ahead and look at
> > >>>>>>>> doing
> > >>>>>>>>
> > >>>>>>>>  something similar for the other ocw/ocw-ui components as well.
> > I'm
> > >>>>>>>>> sure
> > >>>>>>>>> this will help us out a good bit.
> > >>>>>>>>>
> > >>>>>>>>> Thanks!
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -- Joyce
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <
> > >>>>>>>>> denis.nadeau@nasa.gov
> > >>>>>>>>>
> > >>>>>>>>>   wrote:
> > >>>>>>>>>
> > >>>>>>>>>>      Michael,
> > >>>>>>>>>>
> > >>>>>>>>>   I like the idea of having the NetCDF files in a external
> > >>>>>>>>> repository.
> > >>>>>>>>>
> > >>>>>>>>>> I was thinking that it might be better to point the people to
> > >>>>>>>>>> satellite
> > >>>>>>>>>> data at the different DAACs so that they can download the
> files
> > >>>>>>>>>> directly.
> > >>>>>>>>>> That would work for the "obs4MIPs" program.     I would feel
> > >>>>>>>>>> better
> > >>>>>>>>>> about
> > >>>>>>>>>> it as well,   I have been worried to be told by some data
> > >>>>>>>>>> providers
> > >>>>>>>>>> (ECMWF)
> > >>>>>>>>>> that we are not authorized to distribute their original data.
> > I
> > >>>>>>>>>> initially
> > >>>>>>>>>> did not think about this when I checked in my original code.
> > >>>>>>>>>>
> > >>>>>>>>>> I just found out that ECMWF now allows people to download
> their
> > >>>>>>>>>> data
> > >>>>>>>>>> in
> > >>>>>>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it
> > before,
> > >>>>>>>>>> but
> > >>>>>>>>>> could
> > >>>>>>>>>> only retrieve GRIB data and did not want to mess with "Grads"
> > ctl
> > >>>>>>>>>> files
> > >>>>>>>>>> and
> > >>>>>>>>>> CDMS2/CDAT package.    So now, I could just create a script to
> > >>>>>>>>>> download
> > >>>>>>>>>> the
> > >>>>>>>>>> right files and rename them to the appropriate filenames for
> > >>>>>>>>>> obs4MIPs
> > >>>>>>>>>> examples.
> > >>>>>>>>>>
> > >>>>>>>>>> I would feel much better about this.   Let me know what you
> > think.
> > >>>>>>>>>>
> > >>>>>>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
> > >>>>>>>>>> ECMWF+data+servers+in+batch
> > >>>>>>>>>>
> > >>>>>>>>>> Denis
> > >>>>>>>>>>
> > >>>>>>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>     Hi guys,
> > >>>>>>>>>>
> > >>>>>>>>>>   An unfortunate side effect of our export from SVN to Git is
> > that
> > >>>>>>>>>>
> > >>>>>>>>>>> we've
> > >>>>>>>>>>> ended up with a rather bloated repository. We've had a large
> > >>>>>>>>>>> number
> > >>>>>>>>>>> of
> > >>>>>>>>>>> binary files in our repo in the past and all of this has been
> > >>>>>>>>>>> rolled
> > >>>>>>>>>>> up
> > >>>>>>>>>>> into a obnoxious ~500 MB pack file. I've been completely
> unable
> > >>>>>>>>>>> to
> > >>>>>>>>>>> clone
> > >>>>>>>>>>> the repo on my home internet because it constantly times out
> > and
> > >>>>>>>>>>> it's
> > >>>>>>>>>>> painfully slow on my faster work connection.
> > >>>>>>>>>>>
> > >>>>>>>>>>> To fix this problem I suggest we do the following:
> > >>>>>>>>>>> - Remove all binary files from our repo and host them
> > externally.
> > >>>>>>>>>>> For
> > >>>>>>>>>>> example, NetCDF files can be downloaded when they're needed
> and
> > >>>>>>>>>>> cleaned
> > >>>>>>>>>>> up
> > >>>>>>>>>>> afterwards (for tests or examples).
> > >>>>>>>>>>> - Remove all the bloat from our pack file. I was digging
> > through
> > >>>>>>>>>>> stuff
> > >>>>>>>>>>> earlier and found a number of very large and outdated files
> in
> > >>>>>>>>>>> our
> > >>>>>>>>>>> pack
> > >>>>>>>>>>> file (~300 MB NC file, internal JPL presentations/files from
> a
> > >>>>>>>>>>> long
> > >>>>>>>>>>> time
> > >>>>>>>>>>> ago, etc.). We should be able to use [1] to help automate
> this
> > >>>>>>>>>>> for
> > >>>>>>>>>>> us,
> > >>>>>>>>>>> although we can also take care of it on our own if need be.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Let me know what you guys think the best course of action is.
> > >>>>>>>>>>> That
> > >>>>>>>>>>> being
> > >>>>>>>>>>> said, dealing with this sooner rather than later would be
> nice
> > =D
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1] https://github.com/cmaitchison/git_diet
> > >>>>>>>>>>>
> > >>>>>>>>>>> -- Joyce
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>     --
> > >>>>>>>>>>>
> > >>>>>>>>>>>   -----------------------------------------------------
> > >>>>>>>>>>>
> > >>>>>>>>>> Denis Nadeau, (CSC)
> > >>>>>>>>>> NCCS (NASA Center for Climate Simulation)
> > >>>>>>>>>> NASA Goddard Space Flight Center
> > >>>>>>>>>> Mailcode 606.2
> > >>>>>>>>>> 8800 Greenbelt Road
> > >>>>>>>>>> Greenbelt, MD 20771
> > >>>>>>>>>> Email: denis.nadeau@nasa.gov
> > >>>>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
> > >>>>>>>>>> -----------------------------------------------------
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>    --
> > >>>>>>>>>>
> > >>>>>>>>>>  -----------------------------------------------------
> > >>>>>>>>>
> > >>>>>>>> Denis Nadeau, (CSC)
> > >>>>>>>> NCCS (NASA Center for Climate Simulation)
> > >>>>>>>> NASA Goddard Space Flight Center
> > >>>>>>>> Mailcode 606.2
> > >>>>>>>> 8800 Greenbelt Road
> > >>>>>>>> Greenbelt, MD 20771
> > >>>>>>>> Email: denis.nadeau@nasa.gov
> > >>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
> > >>>>>>>> -----------------------------------------------------
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>   --
> > >>>>>>>>
> > >>>>>>> -----------------------------------------------------
> > >>>>>> Denis Nadeau, (CSC)
> > >>>>>> NCCS (NASA Center for Climate Simulation)
> > >>>>>> NASA Goddard Space Flight Center
> > >>>>>> Mailcode 606.2
> > >>>>>> 8800 Greenbelt Road
> > >>>>>> Greenbelt, MD 20771
> > >>>>>> Email: denis.nadeau@nasa.gov
> > >>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
> > >>>>>> -----------------------------------------------------
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>  --
> > >>>> -----------------------------------------------------
> > >>>> Denis Nadeau, (CSC)
> > >>>> NCCS (NASA Center for Climate Simulation)
> > >>>> NASA Goddard Space Flight Center
> > >>>> Mailcode 606.2
> > >>>> 8800 Greenbelt Road
> > >>>> Greenbelt, MD 20771
> > >>>> Email: denis.nadeau@nasa.gov
> > >>>> Phone: (301) 286-7286           Fax: 301.286.1634
> > >>>> -----------------------------------------------------
> > >>>>
> > >>>>
> > >>>>
> > >>
> > >> --
> > >> -----------------------------------------------------
> > >> Denis Nadeau, (CSC)
> > >> NCCS (NASA Center for Climate Simulation)
> > >> NASA Goddard Space Flight Center
> > >> Mailcode 606.2
> > >> 8800 Greenbelt Road
> > >> Greenbelt, MD 20771
> > >> Email: denis.nadeau@nasa.gov
> > >> Phone: (301) 286-7286           Fax: 301.286.1634
> > >> -----------------------------------------------------
> > >>
> > >>
> > >
> >
>

Re: Our new repository is a bit bloated

Posted by Cameron Goodale <go...@apache.org>.
Mike,

I just cloned your repo and all of the tests in 'ocw' passed on my machine.

Cheers,

Cameron
On Mar 14, 2014 12:25 PM, "Michael Joyce" <jo...@apache.org> wrote:

> Update on this. I've tested this on a fork to make sure that it works
> properly. Everything seems fine. I've taken the old repository and removed
> all the old large binaries and leftover JPL related things. I've gone
> through and run some tests, including a sample evaluation through the UI
> and the backend, and everything seems kosher.
>
> The repo is down to ~13 MB now.
>
> It would be a big help if everyone could go pull down the new code [1] and
> make sure everything looks alright. I'll wait a bit before pushing any
> changes to the ASF so everyone has adequate time to test.
>
> [1] https://github.com/MJJoyce/climate
>
>
>
>
> -- Joyce
>
>
> On Wed, Mar 12, 2014 at 2:49 PM, Michael Joyce <jo...@apache.org> wrote:
>
> > Awesome, I'm glad we got this sorted out. Thanks for all the hard work!
> >
> >
> > -- Joyce
> >
> >
> > On Wed, Mar 12, 2014 at 2:45 PM, denis.nadeau <denis.nadeau@nasa.gov
> >wrote:
> >
> >> Happy to be your project guinea pig ! :->
> >>
> >> After recompiling "git" with "libcurl" (./configure --with-curl) I was
> >> able to push the changes.  You need curl to get access to https (seems
> >> like).
> >> Your git remote command was very useful.  I did not have to clone the
> >> repo and copy my files over and redo the "git commands".
> >>
> >> I made quite some changes to obs4MIPs and need to "diff" and push
> changes
> >> to the repository.    So it seems that, I am good to go with "git". :-)
> >>
> >> Great work and thanks for your help!
> >> Denis
> >>
> >>
> >> On 3/12/14 5:01 PM, Michael Joyce wrote:
> >>
> >>> Ah good, we're getting close! We'd be even closer if I hadn't messed up
> >>> in
> >>> a previous git related email!
> >>>
> >>> Our git://git.apache.org/climate.git mirror is our read only git
> mirror.
> >>> That would explain why you aren't able to write to it.
> >>>
> >>> We need to use:
> >>> https://git-wip-us.apache.org/repos/asf/climate.git
> >>>
> >>> If we didn't have a commit bit we would instead use (http vs https)
> >>> http://git-wip-us.apache.org/repos/asf/reponame.git
> >>>
> >>> I misread some documentation at [1] and [2] and confused myself. I
> >>> thought
> >>> the "WIP" or "Work in Progress" label was for migration only. Silly me.
> >>>
> >>> We can fix this fairly easily by running
> >>> $ git remote set-url origin
> >>> https://git-wip-us.apache.org/repos/asf/climate.gi
> >>>
> >>> Then, you should see updated URLs with
> >>> $ git remote -v
> >>>
> >>> At that point you should be able to push successfully.
> >>>
> >>> Sorry that you've turned into our project guinea pig Denis! I had hoped
> >>> to
> >>> smooth out some of these rough edges this last weekend/early this week
> >>> but
> >>> unfortunately I haven't been able to do so. We'll get there though!
> >>>
> >>> [1] https://www.apache.org/dev/writable-git
> >>> [2] https://git-wip-us.apache.org/
> >>>
> >>>
> >>> -- Joyce
> >>>
> >>>
> >>> On Wed, Mar 12, 2014 at 12:44 PM, denis.nadeau <denis.nadeau@nasa.gov
> >>> >wrote:
> >>>
> >>>  Joyce,
> >>>>
> >>>> This is great introduction and will help other SVN/CVS developers.
> (I
> >>>> did not know you had to "git add" every changes.)
> >>>>
> >>>> Right now, I just can't push to github.  I think it might be a
> >>>> configuration issue.  Do you need my ssh keys or something for me to
> >>>> 'push'?
> >>>>
> >>>>     git push origin master
> >>>>
> >>>>         fatal: The remote end hung up unexpectedly
> >>>>
> >>>>
> >>>>     git status
> >>>>
> >>>>         # On branch master
> >>>>         # Your branch is ahead of 'origin/master' by 4 commits.
> >>>>         #
> >>>>         nothing to commit (working directory clean)
> >>>>
> >>>>     git remote -v
> >>>>
> >>>>         origin  git://git.apache.org/climate.git (fetch)
> >>>>         origin  git://git.apache.org/climate.git (push)
> >>>>
> >>>> Thanks for your help. (almost there...)
> >>>> Denis
> >>>>
> >>>>
> >>>> On 3/12/14 12:37 PM, Michael Joyce wrote:
> >>>>
> >>>>  Ah, let me explain since git is just a bit different from SVN.
> >>>>>
> >>>>> When you commit in git you aren't actually committing to the primary
> >>>>> server
> >>>>> like you are in SVN. You're committing to your local working copy. In
> >>>>> order
> >>>>> to mirror those changes to the ASF you will need to run "git push".
> So
> >>>>> "git
> >>>>> status" is telling you that you've committed 4 times and those
> changes
> >>>>> aren't mirrored on the server that you ran "git clone" from.
> >>>>>
> >>>>> To be safe, you might want to checkout a clean copy of the repo from
> >>>>> the
> >>>>> ASF (which should only take forever =) and then try again. We could
> go
> >>>>> through each of the commits and make sure they're the way you want
> >>>>> them to
> >>>>> be, but that might end up being more trouble than it's worth if we
> try
> >>>>> to
> >>>>> do it via email. This is the workflow that I would probably follow:
> >>>>>
> >>>>> # Remove the files that you don't want anymore. I'm going to say that
> >>>>> we're
> >>>>> # sitting in the root of our repo and the files are in
> >>>>> '/obs4MIPs/examples'
> >>>>> $ git rm -r obs4MIPs/examples
> >>>>> $ git status
> >>>>> # You should now see a number of files being marked as "staged for
> >>>>> commit".
> >>>>> # Go ahead commit these removals
> >>>>> $ git commit -m "Removing obs4MIPs example .nc files"
> >>>>>
> >>>>> # Now if you run git status you shouldn't see any files listed, but
> it
> >>>>> will
> >>>>> say
> >>>>> # that you're ahead of origin/master by 1 commit
> >>>>>
> >>>>> # Now add the readme/or update any other files
> >>>>>
> >>>>> $ git add .
> >>>>> # It's important to note that "add" in git is not the same as "add"
> in
> >>>>> svn.
> >>>>> Add in git means
> >>>>> # "add/stage these changes for the next commit". If you're used to
> svn
> >>>>> this
> >>>>> can be a bit
> >>>>> # confusing. In git you need to add changes every time you want to
> >>>>> commit,
> >>>>> as opposed
> >>>>> # to svn where you only "add" the file to the repo once.
> >>>>>
> >>>>> $ git status
> >>>>> # You should see all the files that you changed present and "staged
> for
> >>>>> commit". When
> >>>>> # something is "staged for commit" that means that it will be
> committed
> >>>>> next time we
> >>>>> # run git commit.
> >>>>> $ git commit -m "Update blah blah blah"
> >>>>>
> >>>>> # Now you should see that you're ahead by a few commits depending on
> >>>>> how
> >>>>> many times you've committed.
> >>>>> # At this point you probably want to share all your changes with
> >>>>> everyone,
> >>>>> so we'll push the changes up to the server.
> >>>>>
> >>>>> # You really can abbreviate this to just 'git push' or 'git push
> >>>>> origin'.
> >>>>> We're going to play it safe and be super explicit.
> >>>>> # This is telling git to push all the changes that you've committed
> in
> >>>>> your
> >>>>> 'master' branch
> >>>>> # (which is the default one that you've been working in) to the
> remote
> >>>>> named "origin". By default,
> >>>>> # the repo that you cloned from is named "origin".
> >>>>> $ git push origin master
> >>>>>
> >>>>> At this point we should get some emails saying you committed.
> >>>>>
> >>>>> Hopefully that helps a bit. If you have more questions let me know.
> It
> >>>>> can
> >>>>> certainly be a bit jarring of a change moving to git from svn. I'm
> >>>>> working
> >>>>> on writing up a brief "intro to git" that I will send around to the
> >>>>> mailing
> >>>>> list once it's in a useful state. It should hopefully help clear up
> >>>>> some
> >>>>> confusion for everyone.
> >>>>>
> >>>>>
> >>>>> -- Joyce
> >>>>>
> >>>>>
> >>>>> On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <denis.nadeau@nasa.gov
> >
> >>>>> wrote:
> >>>>>
> >>>>>   Joyce,
> >>>>>
> >>>>>> I did commit the change and also remove "rm" the .nc files.  I did
> not
> >>>>>> see
> >>>>>> an email either.   Here are the 3 commands I used
> >>>>>>
> >>>>>> 1. git add
> >>>>>> 2. git commit
> >>>>>> 3. git rm
> >>>>>>
> >>>>>>
> >>>>>> I guess "git rm" does not need a commit command.
> >>>>>> When I run "git status" I get this message. I am not sure what
> "ahead
> >>>>>> of
> >>>>>> 'origin/master' by 4 commits" means!
> >>>>>>
> >>>>>>      git status
> >>>>>>      # On branch master
> >>>>>>      # Your branch is ahead of 'origin/master' by 4 commits.
> >>>>>>      #
> >>>>>>      nothing to commit (working directory clean)
> >>>>>>
> >>>>>>
> >>>>>> Denis
> >>>>>>
> >>>>>> On 3/12/14 11:19 AM, Michael Joyce wrote:
> >>>>>>
> >>>>>>   Awesome Denis thanks much. I will play around with this more soon
> >>>>>> and
> >>>>>>
> >>>>>>> see
> >>>>>>> if I can't strip out some more files. Did you push your changes up
> to
> >>>>>>> the
> >>>>>>> repo? I didn't see a commit email come through, but I'm not certain
> >>>>>>> my
> >>>>>>> filters are working correctly with the mailing list migrations.
> >>>>>>>
> >>>>>>>
> >>>>>>> -- Joyce
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <
> denis.nadeau@nasa.gov
> >>>>>>> >
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>    Joyce,
> >>>>>>>
> >>>>>>>  I deleted the .nc files found in my example directory for TRMM and
> >>>>>>>> ECMWF.
> >>>>>>>>     I have installed a README file and explain users how to
> >>>>>>>> retrieve the
> >>>>>>>> data
> >>>>>>>> from the original data provider.    TRMM is pretty
> straightforward,
> >>>>>>>> but
> >>>>>>>> for
> >>>>>>>> ECMWF you need to register, obtain a key and download their Python
> >>>>>>>> package.
> >>>>>>>>
> >>>>>>>> It works pretty well on my machine, let see what users say.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Denis
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
> >>>>>>>>
> >>>>>>>>    I think that would be great Denis! I can go ahead and look at
> >>>>>>>> doing
> >>>>>>>>
> >>>>>>>>  something similar for the other ocw/ocw-ui components as well.
> I'm
> >>>>>>>>> sure
> >>>>>>>>> this will help us out a good bit.
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -- Joyce
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <
> >>>>>>>>> denis.nadeau@nasa.gov
> >>>>>>>>>
> >>>>>>>>>   wrote:
> >>>>>>>>>
> >>>>>>>>>>      Michael,
> >>>>>>>>>>
> >>>>>>>>>   I like the idea of having the NetCDF files in a external
> >>>>>>>>> repository.
> >>>>>>>>>
> >>>>>>>>>> I was thinking that it might be better to point the people to
> >>>>>>>>>> satellite
> >>>>>>>>>> data at the different DAACs so that they can download the files
> >>>>>>>>>> directly.
> >>>>>>>>>> That would work for the "obs4MIPs" program.     I would feel
> >>>>>>>>>> better
> >>>>>>>>>> about
> >>>>>>>>>> it as well,   I have been worried to be told by some data
> >>>>>>>>>> providers
> >>>>>>>>>> (ECMWF)
> >>>>>>>>>> that we are not authorized to distribute their original data.
> I
> >>>>>>>>>> initially
> >>>>>>>>>> did not think about this when I checked in my original code.
> >>>>>>>>>>
> >>>>>>>>>> I just found out that ECMWF now allows people to download their
> >>>>>>>>>> data
> >>>>>>>>>> in
> >>>>>>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it
> before,
> >>>>>>>>>> but
> >>>>>>>>>> could
> >>>>>>>>>> only retrieve GRIB data and did not want to mess with "Grads"
> ctl
> >>>>>>>>>> files
> >>>>>>>>>> and
> >>>>>>>>>> CDMS2/CDAT package.    So now, I could just create a script to
> >>>>>>>>>> download
> >>>>>>>>>> the
> >>>>>>>>>> right files and rename them to the appropriate filenames for
> >>>>>>>>>> obs4MIPs
> >>>>>>>>>> examples.
> >>>>>>>>>>
> >>>>>>>>>> I would feel much better about this.   Let me know what you
> think.
> >>>>>>>>>>
> >>>>>>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
> >>>>>>>>>> ECMWF+data+servers+in+batch
> >>>>>>>>>>
> >>>>>>>>>> Denis
> >>>>>>>>>>
> >>>>>>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
> >>>>>>>>>>
> >>>>>>>>>>     Hi guys,
> >>>>>>>>>>
> >>>>>>>>>>   An unfortunate side effect of our export from SVN to Git is
> that
> >>>>>>>>>>
> >>>>>>>>>>> we've
> >>>>>>>>>>> ended up with a rather bloated repository. We've had a large
> >>>>>>>>>>> number
> >>>>>>>>>>> of
> >>>>>>>>>>> binary files in our repo in the past and all of this has been
> >>>>>>>>>>> rolled
> >>>>>>>>>>> up
> >>>>>>>>>>> into a obnoxious ~500 MB pack file. I've been completely unable
> >>>>>>>>>>> to
> >>>>>>>>>>> clone
> >>>>>>>>>>> the repo on my home internet because it constantly times out
> and
> >>>>>>>>>>> it's
> >>>>>>>>>>> painfully slow on my faster work connection.
> >>>>>>>>>>>
> >>>>>>>>>>> To fix this problem I suggest we do the following:
> >>>>>>>>>>> - Remove all binary files from our repo and host them
> externally.
> >>>>>>>>>>> For
> >>>>>>>>>>> example, NetCDF files can be downloaded when they're needed and
> >>>>>>>>>>> cleaned
> >>>>>>>>>>> up
> >>>>>>>>>>> afterwards (for tests or examples).
> >>>>>>>>>>> - Remove all the bloat from our pack file. I was digging
> through
> >>>>>>>>>>> stuff
> >>>>>>>>>>> earlier and found a number of very large and outdated files in
> >>>>>>>>>>> our
> >>>>>>>>>>> pack
> >>>>>>>>>>> file (~300 MB NC file, internal JPL presentations/files from a
> >>>>>>>>>>> long
> >>>>>>>>>>> time
> >>>>>>>>>>> ago, etc.). We should be able to use [1] to help automate this
> >>>>>>>>>>> for
> >>>>>>>>>>> us,
> >>>>>>>>>>> although we can also take care of it on our own if need be.
> >>>>>>>>>>>
> >>>>>>>>>>> Let me know what you guys think the best course of action is.
> >>>>>>>>>>> That
> >>>>>>>>>>> being
> >>>>>>>>>>> said, dealing with this sooner rather than later would be nice
> =D
> >>>>>>>>>>>
> >>>>>>>>>>> [1] https://github.com/cmaitchison/git_diet
> >>>>>>>>>>>
> >>>>>>>>>>> -- Joyce
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>     --
> >>>>>>>>>>>
> >>>>>>>>>>>   -----------------------------------------------------
> >>>>>>>>>>>
> >>>>>>>>>> Denis Nadeau, (CSC)
> >>>>>>>>>> NCCS (NASA Center for Climate Simulation)
> >>>>>>>>>> NASA Goddard Space Flight Center
> >>>>>>>>>> Mailcode 606.2
> >>>>>>>>>> 8800 Greenbelt Road
> >>>>>>>>>> Greenbelt, MD 20771
> >>>>>>>>>> Email: denis.nadeau@nasa.gov
> >>>>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
> >>>>>>>>>> -----------------------------------------------------
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>    --
> >>>>>>>>>>
> >>>>>>>>>>  -----------------------------------------------------
> >>>>>>>>>
> >>>>>>>> Denis Nadeau, (CSC)
> >>>>>>>> NCCS (NASA Center for Climate Simulation)
> >>>>>>>> NASA Goddard Space Flight Center
> >>>>>>>> Mailcode 606.2
> >>>>>>>> 8800 Greenbelt Road
> >>>>>>>> Greenbelt, MD 20771
> >>>>>>>> Email: denis.nadeau@nasa.gov
> >>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
> >>>>>>>> -----------------------------------------------------
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>   --
> >>>>>>>>
> >>>>>>> -----------------------------------------------------
> >>>>>> Denis Nadeau, (CSC)
> >>>>>> NCCS (NASA Center for Climate Simulation)
> >>>>>> NASA Goddard Space Flight Center
> >>>>>> Mailcode 606.2
> >>>>>> 8800 Greenbelt Road
> >>>>>> Greenbelt, MD 20771
> >>>>>> Email: denis.nadeau@nasa.gov
> >>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
> >>>>>> -----------------------------------------------------
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>  --
> >>>> -----------------------------------------------------
> >>>> Denis Nadeau, (CSC)
> >>>> NCCS (NASA Center for Climate Simulation)
> >>>> NASA Goddard Space Flight Center
> >>>> Mailcode 606.2
> >>>> 8800 Greenbelt Road
> >>>> Greenbelt, MD 20771
> >>>> Email: denis.nadeau@nasa.gov
> >>>> Phone: (301) 286-7286           Fax: 301.286.1634
> >>>> -----------------------------------------------------
> >>>>
> >>>>
> >>>>
> >>
> >> --
> >> -----------------------------------------------------
> >> Denis Nadeau, (CSC)
> >> NCCS (NASA Center for Climate Simulation)
> >> NASA Goddard Space Flight Center
> >> Mailcode 606.2
> >> 8800 Greenbelt Road
> >> Greenbelt, MD 20771
> >> Email: denis.nadeau@nasa.gov
> >> Phone: (301) 286-7286           Fax: 301.286.1634
> >> -----------------------------------------------------
> >>
> >>
> >
>

Re: Our new repository is a bit bloated

Posted by Michael Joyce <jo...@apache.org>.
Update on this. I've tested this on a fork to make sure that it works
properly. Everything seems fine. I've taken the old repository and removed
all the old large binaries and leftover JPL related things. I've gone
through and run some tests, including a sample evaluation through the UI
and the backend, and everything seems kosher.

The repo is down to ~13 MB now.

It would be a big help if everyone could go pull down the new code [1] and
make sure everything looks alright. I'll wait a bit before pushing any
changes to the ASF so everyone has adequate time to test.

[1] https://github.com/MJJoyce/climate




-- Joyce


On Wed, Mar 12, 2014 at 2:49 PM, Michael Joyce <jo...@apache.org> wrote:

> Awesome, I'm glad we got this sorted out. Thanks for all the hard work!
>
>
> -- Joyce
>
>
> On Wed, Mar 12, 2014 at 2:45 PM, denis.nadeau <de...@nasa.gov>wrote:
>
>> Happy to be your project guinea pig ! :->
>>
>> After recompiling "git" with "libcurl" (./configure --with-curl) I was
>> able to push the changes.  You need curl to get access to https (seems
>> like).
>> Your git remote command was very useful.  I did not have to clone the
>> repo and copy my files over and redo the "git commands".
>>
>> I made quite some changes to obs4MIPs and need to "diff" and push changes
>> to the repository.    So it seems that, I am good to go with "git". :-)
>>
>> Great work and thanks for your help!
>> Denis
>>
>>
>> On 3/12/14 5:01 PM, Michael Joyce wrote:
>>
>>> Ah good, we're getting close! We'd be even closer if I hadn't messed up
>>> in
>>> a previous git related email!
>>>
>>> Our git://git.apache.org/climate.git mirror is our read only git mirror.
>>> That would explain why you aren't able to write to it.
>>>
>>> We need to use:
>>> https://git-wip-us.apache.org/repos/asf/climate.git
>>>
>>> If we didn't have a commit bit we would instead use (http vs https)
>>> http://git-wip-us.apache.org/repos/asf/reponame.git
>>>
>>> I misread some documentation at [1] and [2] and confused myself. I
>>> thought
>>> the "WIP" or "Work in Progress" label was for migration only. Silly me.
>>>
>>> We can fix this fairly easily by running
>>> $ git remote set-url origin
>>> https://git-wip-us.apache.org/repos/asf/climate.gi
>>>
>>> Then, you should see updated URLs with
>>> $ git remote -v
>>>
>>> At that point you should be able to push successfully.
>>>
>>> Sorry that you've turned into our project guinea pig Denis! I had hoped
>>> to
>>> smooth out some of these rough edges this last weekend/early this week
>>> but
>>> unfortunately I haven't been able to do so. We'll get there though!
>>>
>>> [1] https://www.apache.org/dev/writable-git
>>> [2] https://git-wip-us.apache.org/
>>>
>>>
>>> -- Joyce
>>>
>>>
>>> On Wed, Mar 12, 2014 at 12:44 PM, denis.nadeau <denis.nadeau@nasa.gov
>>> >wrote:
>>>
>>>  Joyce,
>>>>
>>>> This is great introduction and will help other SVN/CVS developers.   (I
>>>> did not know you had to "git add" every changes.)
>>>>
>>>> Right now, I just can't push to github.  I think it might be a
>>>> configuration issue.  Do you need my ssh keys or something for me to
>>>> 'push'?
>>>>
>>>>     git push origin master
>>>>
>>>>         fatal: The remote end hung up unexpectedly
>>>>
>>>>
>>>>     git status
>>>>
>>>>         # On branch master
>>>>         # Your branch is ahead of 'origin/master' by 4 commits.
>>>>         #
>>>>         nothing to commit (working directory clean)
>>>>
>>>>     git remote -v
>>>>
>>>>         origin  git://git.apache.org/climate.git (fetch)
>>>>         origin  git://git.apache.org/climate.git (push)
>>>>
>>>> Thanks for your help. (almost there...)
>>>> Denis
>>>>
>>>>
>>>> On 3/12/14 12:37 PM, Michael Joyce wrote:
>>>>
>>>>  Ah, let me explain since git is just a bit different from SVN.
>>>>>
>>>>> When you commit in git you aren't actually committing to the primary
>>>>> server
>>>>> like you are in SVN. You're committing to your local working copy. In
>>>>> order
>>>>> to mirror those changes to the ASF you will need to run "git push". So
>>>>> "git
>>>>> status" is telling you that you've committed 4 times and those changes
>>>>> aren't mirrored on the server that you ran "git clone" from.
>>>>>
>>>>> To be safe, you might want to checkout a clean copy of the repo from
>>>>> the
>>>>> ASF (which should only take forever =) and then try again. We could go
>>>>> through each of the commits and make sure they're the way you want
>>>>> them to
>>>>> be, but that might end up being more trouble than it's worth if we try
>>>>> to
>>>>> do it via email. This is the workflow that I would probably follow:
>>>>>
>>>>> # Remove the files that you don't want anymore. I'm going to say that
>>>>> we're
>>>>> # sitting in the root of our repo and the files are in
>>>>> '/obs4MIPs/examples'
>>>>> $ git rm -r obs4MIPs/examples
>>>>> $ git status
>>>>> # You should now see a number of files being marked as "staged for
>>>>> commit".
>>>>> # Go ahead commit these removals
>>>>> $ git commit -m "Removing obs4MIPs example .nc files"
>>>>>
>>>>> # Now if you run git status you shouldn't see any files listed, but it
>>>>> will
>>>>> say
>>>>> # that you're ahead of origin/master by 1 commit
>>>>>
>>>>> # Now add the readme/or update any other files
>>>>>
>>>>> $ git add .
>>>>> # It's important to note that "add" in git is not the same as "add" in
>>>>> svn.
>>>>> Add in git means
>>>>> # "add/stage these changes for the next commit". If you're used to svn
>>>>> this
>>>>> can be a bit
>>>>> # confusing. In git you need to add changes every time you want to
>>>>> commit,
>>>>> as opposed
>>>>> # to svn where you only "add" the file to the repo once.
>>>>>
>>>>> $ git status
>>>>> # You should see all the files that you changed present and "staged for
>>>>> commit". When
>>>>> # something is "staged for commit" that means that it will be committed
>>>>> next time we
>>>>> # run git commit.
>>>>> $ git commit -m "Update blah blah blah"
>>>>>
>>>>> # Now you should see that you're ahead by a few commits depending on
>>>>> how
>>>>> many times you've committed.
>>>>> # At this point you probably want to share all your changes with
>>>>> everyone,
>>>>> so we'll push the changes up to the server.
>>>>>
>>>>> # You really can abbreviate this to just 'git push' or 'git push
>>>>> origin'.
>>>>> We're going to play it safe and be super explicit.
>>>>> # This is telling git to push all the changes that you've committed in
>>>>> your
>>>>> 'master' branch
>>>>> # (which is the default one that you've been working in) to the remote
>>>>> named "origin". By default,
>>>>> # the repo that you cloned from is named "origin".
>>>>> $ git push origin master
>>>>>
>>>>> At this point we should get some emails saying you committed.
>>>>>
>>>>> Hopefully that helps a bit. If you have more questions let me know. It
>>>>> can
>>>>> certainly be a bit jarring of a change moving to git from svn. I'm
>>>>> working
>>>>> on writing up a brief "intro to git" that I will send around to the
>>>>> mailing
>>>>> list once it's in a useful state. It should hopefully help clear up
>>>>> some
>>>>> confusion for everyone.
>>>>>
>>>>>
>>>>> -- Joyce
>>>>>
>>>>>
>>>>> On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <de...@nasa.gov>
>>>>> wrote:
>>>>>
>>>>>   Joyce,
>>>>>
>>>>>> I did commit the change and also remove "rm" the .nc files.  I did not
>>>>>> see
>>>>>> an email either.   Here are the 3 commands I used
>>>>>>
>>>>>> 1. git add
>>>>>> 2. git commit
>>>>>> 3. git rm
>>>>>>
>>>>>>
>>>>>> I guess "git rm" does not need a commit command.
>>>>>> When I run "git status" I get this message. I am not sure what "ahead
>>>>>> of
>>>>>> 'origin/master' by 4 commits" means!
>>>>>>
>>>>>>      git status
>>>>>>      # On branch master
>>>>>>      # Your branch is ahead of 'origin/master' by 4 commits.
>>>>>>      #
>>>>>>      nothing to commit (working directory clean)
>>>>>>
>>>>>>
>>>>>> Denis
>>>>>>
>>>>>> On 3/12/14 11:19 AM, Michael Joyce wrote:
>>>>>>
>>>>>>   Awesome Denis thanks much. I will play around with this more soon
>>>>>> and
>>>>>>
>>>>>>> see
>>>>>>> if I can't strip out some more files. Did you push your changes up to
>>>>>>> the
>>>>>>> repo? I didn't see a commit email come through, but I'm not certain
>>>>>>> my
>>>>>>> filters are working correctly with the mailing list migrations.
>>>>>>>
>>>>>>>
>>>>>>> -- Joyce
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <denis.nadeau@nasa.gov
>>>>>>> >
>>>>>>> wrote:
>>>>>>>
>>>>>>>    Joyce,
>>>>>>>
>>>>>>>  I deleted the .nc files found in my example directory for TRMM and
>>>>>>>> ECMWF.
>>>>>>>>     I have installed a README file and explain users how to
>>>>>>>> retrieve the
>>>>>>>> data
>>>>>>>> from the original data provider.    TRMM is pretty straightforward,
>>>>>>>> but
>>>>>>>> for
>>>>>>>> ECMWF you need to register, obtain a key and download their Python
>>>>>>>> package.
>>>>>>>>
>>>>>>>> It works pretty well on my machine, let see what users say.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
>>>>>>>>
>>>>>>>>    I think that would be great Denis! I can go ahead and look at
>>>>>>>> doing
>>>>>>>>
>>>>>>>>  something similar for the other ocw/ocw-ui components as well. I'm
>>>>>>>>> sure
>>>>>>>>> this will help us out a good bit.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- Joyce
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <
>>>>>>>>> denis.nadeau@nasa.gov
>>>>>>>>>
>>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>>>      Michael,
>>>>>>>>>>
>>>>>>>>>   I like the idea of having the NetCDF files in a external
>>>>>>>>> repository.
>>>>>>>>>
>>>>>>>>>> I was thinking that it might be better to point the people to
>>>>>>>>>> satellite
>>>>>>>>>> data at the different DAACs so that they can download the files
>>>>>>>>>> directly.
>>>>>>>>>> That would work for the "obs4MIPs" program.     I would feel
>>>>>>>>>> better
>>>>>>>>>> about
>>>>>>>>>> it as well,   I have been worried to be told by some data
>>>>>>>>>> providers
>>>>>>>>>> (ECMWF)
>>>>>>>>>> that we are not authorized to distribute their original data.   I
>>>>>>>>>> initially
>>>>>>>>>> did not think about this when I checked in my original code.
>>>>>>>>>>
>>>>>>>>>> I just found out that ECMWF now allows people to download their
>>>>>>>>>> data
>>>>>>>>>> in
>>>>>>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before,
>>>>>>>>>> but
>>>>>>>>>> could
>>>>>>>>>> only retrieve GRIB data and did not want to mess with "Grads" ctl
>>>>>>>>>> files
>>>>>>>>>> and
>>>>>>>>>> CDMS2/CDAT package.    So now, I could just create a script to
>>>>>>>>>> download
>>>>>>>>>> the
>>>>>>>>>> right files and rename them to the appropriate filenames for
>>>>>>>>>> obs4MIPs
>>>>>>>>>> examples.
>>>>>>>>>>
>>>>>>>>>> I would feel much better about this.   Let me know what you think.
>>>>>>>>>>
>>>>>>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>>>>>>>>> ECMWF+data+servers+in+batch
>>>>>>>>>>
>>>>>>>>>> Denis
>>>>>>>>>>
>>>>>>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>>>>>>>>
>>>>>>>>>>     Hi guys,
>>>>>>>>>>
>>>>>>>>>>   An unfortunate side effect of our export from SVN to Git is that
>>>>>>>>>>
>>>>>>>>>>> we've
>>>>>>>>>>> ended up with a rather bloated repository. We've had a large
>>>>>>>>>>> number
>>>>>>>>>>> of
>>>>>>>>>>> binary files in our repo in the past and all of this has been
>>>>>>>>>>> rolled
>>>>>>>>>>> up
>>>>>>>>>>> into a obnoxious ~500 MB pack file. I've been completely unable
>>>>>>>>>>> to
>>>>>>>>>>> clone
>>>>>>>>>>> the repo on my home internet because it constantly times out and
>>>>>>>>>>> it's
>>>>>>>>>>> painfully slow on my faster work connection.
>>>>>>>>>>>
>>>>>>>>>>> To fix this problem I suggest we do the following:
>>>>>>>>>>> - Remove all binary files from our repo and host them externally.
>>>>>>>>>>> For
>>>>>>>>>>> example, NetCDF files can be downloaded when they're needed and
>>>>>>>>>>> cleaned
>>>>>>>>>>> up
>>>>>>>>>>> afterwards (for tests or examples).
>>>>>>>>>>> - Remove all the bloat from our pack file. I was digging through
>>>>>>>>>>> stuff
>>>>>>>>>>> earlier and found a number of very large and outdated files in
>>>>>>>>>>> our
>>>>>>>>>>> pack
>>>>>>>>>>> file (~300 MB NC file, internal JPL presentations/files from a
>>>>>>>>>>> long
>>>>>>>>>>> time
>>>>>>>>>>> ago, etc.). We should be able to use [1] to help automate this
>>>>>>>>>>> for
>>>>>>>>>>> us,
>>>>>>>>>>> although we can also take care of it on our own if need be.
>>>>>>>>>>>
>>>>>>>>>>> Let me know what you guys think the best course of action is.
>>>>>>>>>>> That
>>>>>>>>>>> being
>>>>>>>>>>> said, dealing with this sooner rather than later would be nice =D
>>>>>>>>>>>
>>>>>>>>>>> [1] https://github.com/cmaitchison/git_diet
>>>>>>>>>>>
>>>>>>>>>>> -- Joyce
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     --
>>>>>>>>>>>
>>>>>>>>>>>   -----------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>> Denis Nadeau, (CSC)
>>>>>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>>>>>> NASA Goddard Space Flight Center
>>>>>>>>>> Mailcode 606.2
>>>>>>>>>> 8800 Greenbelt Road
>>>>>>>>>> Greenbelt, MD 20771
>>>>>>>>>> Email: denis.nadeau@nasa.gov
>>>>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>>>>>> -----------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    --
>>>>>>>>>>
>>>>>>>>>>  -----------------------------------------------------
>>>>>>>>>
>>>>>>>> Denis Nadeau, (CSC)
>>>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>>>> NASA Goddard Space Flight Center
>>>>>>>> Mailcode 606.2
>>>>>>>> 8800 Greenbelt Road
>>>>>>>> Greenbelt, MD 20771
>>>>>>>> Email: denis.nadeau@nasa.gov
>>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>>>> -----------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   --
>>>>>>>>
>>>>>>> -----------------------------------------------------
>>>>>> Denis Nadeau, (CSC)
>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>> NASA Goddard Space Flight Center
>>>>>> Mailcode 606.2
>>>>>> 8800 Greenbelt Road
>>>>>> Greenbelt, MD 20771
>>>>>> Email: denis.nadeau@nasa.gov
>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>> -----------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>> -----------------------------------------------------
>>>> Denis Nadeau, (CSC)
>>>> NCCS (NASA Center for Climate Simulation)
>>>> NASA Goddard Space Flight Center
>>>> Mailcode 606.2
>>>> 8800 Greenbelt Road
>>>> Greenbelt, MD 20771
>>>> Email: denis.nadeau@nasa.gov
>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>>
>> --
>> -----------------------------------------------------
>> Denis Nadeau, (CSC)
>> NCCS (NASA Center for Climate Simulation)
>> NASA Goddard Space Flight Center
>> Mailcode 606.2
>> 8800 Greenbelt Road
>> Greenbelt, MD 20771
>> Email: denis.nadeau@nasa.gov
>> Phone: (301) 286-7286           Fax: 301.286.1634
>> -----------------------------------------------------
>>
>>
>

Re: Our new repository is a bit bloated

Posted by Michael Joyce <jo...@apache.org>.
Awesome, I'm glad we got this sorted out. Thanks for all the hard work!


-- Joyce


On Wed, Mar 12, 2014 at 2:45 PM, denis.nadeau <de...@nasa.gov> wrote:

> Happy to be your project guinea pig ! :->
>
> After recompiling "git" with "libcurl" (./configure --with-curl) I was
> able to push the changes.  You need curl to get access to https (seems
> like).
> Your git remote command was very useful.  I did not have to clone the repo
> and copy my files over and redo the "git commands".
>
> I made quite some changes to obs4MIPs and need to "diff" and push changes
> to the repository.    So it seems that, I am good to go with "git". :-)
>
> Great work and thanks for your help!
> Denis
>
>
> On 3/12/14 5:01 PM, Michael Joyce wrote:
>
>> Ah good, we're getting close! We'd be even closer if I hadn't messed up in
>> a previous git related email!
>>
>> Our git://git.apache.org/climate.git mirror is our read only git mirror.
>> That would explain why you aren't able to write to it.
>>
>> We need to use:
>> https://git-wip-us.apache.org/repos/asf/climate.git
>>
>> If we didn't have a commit bit we would instead use (http vs https)
>> http://git-wip-us.apache.org/repos/asf/reponame.git
>>
>> I misread some documentation at [1] and [2] and confused myself. I thought
>> the "WIP" or "Work in Progress" label was for migration only. Silly me.
>>
>> We can fix this fairly easily by running
>> $ git remote set-url origin
>> https://git-wip-us.apache.org/repos/asf/climate.gi
>>
>> Then, you should see updated URLs with
>> $ git remote -v
>>
>> At that point you should be able to push successfully.
>>
>> Sorry that you've turned into our project guinea pig Denis! I had hoped to
>> smooth out some of these rough edges this last weekend/early this week but
>> unfortunately I haven't been able to do so. We'll get there though!
>>
>> [1] https://www.apache.org/dev/writable-git
>> [2] https://git-wip-us.apache.org/
>>
>>
>> -- Joyce
>>
>>
>> On Wed, Mar 12, 2014 at 12:44 PM, denis.nadeau <denis.nadeau@nasa.gov
>> >wrote:
>>
>>  Joyce,
>>>
>>> This is great introduction and will help other SVN/CVS developers.   (I
>>> did not know you had to "git add" every changes.)
>>>
>>> Right now, I just can't push to github.  I think it might be a
>>> configuration issue.  Do you need my ssh keys or something for me to
>>> 'push'?
>>>
>>>     git push origin master
>>>
>>>         fatal: The remote end hung up unexpectedly
>>>
>>>
>>>     git status
>>>
>>>         # On branch master
>>>         # Your branch is ahead of 'origin/master' by 4 commits.
>>>         #
>>>         nothing to commit (working directory clean)
>>>
>>>     git remote -v
>>>
>>>         origin  git://git.apache.org/climate.git (fetch)
>>>         origin  git://git.apache.org/climate.git (push)
>>>
>>> Thanks for your help. (almost there...)
>>> Denis
>>>
>>>
>>> On 3/12/14 12:37 PM, Michael Joyce wrote:
>>>
>>>  Ah, let me explain since git is just a bit different from SVN.
>>>>
>>>> When you commit in git you aren't actually committing to the primary
>>>> server
>>>> like you are in SVN. You're committing to your local working copy. In
>>>> order
>>>> to mirror those changes to the ASF you will need to run "git push". So
>>>> "git
>>>> status" is telling you that you've committed 4 times and those changes
>>>> aren't mirrored on the server that you ran "git clone" from.
>>>>
>>>> To be safe, you might want to checkout a clean copy of the repo from the
>>>> ASF (which should only take forever =) and then try again. We could go
>>>> through each of the commits and make sure they're the way you want them
>>>> to
>>>> be, but that might end up being more trouble than it's worth if we try
>>>> to
>>>> do it via email. This is the workflow that I would probably follow:
>>>>
>>>> # Remove the files that you don't want anymore. I'm going to say that
>>>> we're
>>>> # sitting in the root of our repo and the files are in
>>>> '/obs4MIPs/examples'
>>>> $ git rm -r obs4MIPs/examples
>>>> $ git status
>>>> # You should now see a number of files being marked as "staged for
>>>> commit".
>>>> # Go ahead commit these removals
>>>> $ git commit -m "Removing obs4MIPs example .nc files"
>>>>
>>>> # Now if you run git status you shouldn't see any files listed, but it
>>>> will
>>>> say
>>>> # that you're ahead of origin/master by 1 commit
>>>>
>>>> # Now add the readme/or update any other files
>>>>
>>>> $ git add .
>>>> # It's important to note that "add" in git is not the same as "add" in
>>>> svn.
>>>> Add in git means
>>>> # "add/stage these changes for the next commit". If you're used to svn
>>>> this
>>>> can be a bit
>>>> # confusing. In git you need to add changes every time you want to
>>>> commit,
>>>> as opposed
>>>> # to svn where you only "add" the file to the repo once.
>>>>
>>>> $ git status
>>>> # You should see all the files that you changed present and "staged for
>>>> commit". When
>>>> # something is "staged for commit" that means that it will be committed
>>>> next time we
>>>> # run git commit.
>>>> $ git commit -m "Update blah blah blah"
>>>>
>>>> # Now you should see that you're ahead by a few commits depending on how
>>>> many times you've committed.
>>>> # At this point you probably want to share all your changes with
>>>> everyone,
>>>> so we'll push the changes up to the server.
>>>>
>>>> # You really can abbreviate this to just 'git push' or 'git push
>>>> origin'.
>>>> We're going to play it safe and be super explicit.
>>>> # This is telling git to push all the changes that you've committed in
>>>> your
>>>> 'master' branch
>>>> # (which is the default one that you've been working in) to the remote
>>>> named "origin". By default,
>>>> # the repo that you cloned from is named "origin".
>>>> $ git push origin master
>>>>
>>>> At this point we should get some emails saying you committed.
>>>>
>>>> Hopefully that helps a bit. If you have more questions let me know. It
>>>> can
>>>> certainly be a bit jarring of a change moving to git from svn. I'm
>>>> working
>>>> on writing up a brief "intro to git" that I will send around to the
>>>> mailing
>>>> list once it's in a useful state. It should hopefully help clear up some
>>>> confusion for everyone.
>>>>
>>>>
>>>> -- Joyce
>>>>
>>>>
>>>> On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <de...@nasa.gov>
>>>> wrote:
>>>>
>>>>   Joyce,
>>>>
>>>>> I did commit the change and also remove "rm" the .nc files.  I did not
>>>>> see
>>>>> an email either.   Here are the 3 commands I used
>>>>>
>>>>> 1. git add
>>>>> 2. git commit
>>>>> 3. git rm
>>>>>
>>>>>
>>>>> I guess "git rm" does not need a commit command.
>>>>> When I run "git status" I get this message. I am not sure what "ahead
>>>>> of
>>>>> 'origin/master' by 4 commits" means!
>>>>>
>>>>>      git status
>>>>>      # On branch master
>>>>>      # Your branch is ahead of 'origin/master' by 4 commits.
>>>>>      #
>>>>>      nothing to commit (working directory clean)
>>>>>
>>>>>
>>>>> Denis
>>>>>
>>>>> On 3/12/14 11:19 AM, Michael Joyce wrote:
>>>>>
>>>>>   Awesome Denis thanks much. I will play around with this more soon and
>>>>>
>>>>>> see
>>>>>> if I can't strip out some more files. Did you push your changes up to
>>>>>> the
>>>>>> repo? I didn't see a commit email come through, but I'm not certain my
>>>>>> filters are working correctly with the mailing list migrations.
>>>>>>
>>>>>>
>>>>>> -- Joyce
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <de...@nasa.gov>
>>>>>> wrote:
>>>>>>
>>>>>>    Joyce,
>>>>>>
>>>>>>  I deleted the .nc files found in my example directory for TRMM and
>>>>>>> ECMWF.
>>>>>>>     I have installed a README file and explain users how to retrieve
>>>>>>> the
>>>>>>> data
>>>>>>> from the original data provider.    TRMM is pretty straightforward,
>>>>>>> but
>>>>>>> for
>>>>>>> ECMWF you need to register, obtain a key and download their Python
>>>>>>> package.
>>>>>>>
>>>>>>> It works pretty well on my machine, let see what users say.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Denis
>>>>>>>
>>>>>>>
>>>>>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
>>>>>>>
>>>>>>>    I think that would be great Denis! I can go ahead and look at
>>>>>>> doing
>>>>>>>
>>>>>>>  something similar for the other ocw/ocw-ui components as well. I'm
>>>>>>>> sure
>>>>>>>> this will help us out a good bit.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> -- Joyce
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <
>>>>>>>> denis.nadeau@nasa.gov
>>>>>>>>
>>>>>>>>   wrote:
>>>>>>>>
>>>>>>>>>      Michael,
>>>>>>>>>
>>>>>>>>   I like the idea of having the NetCDF files in a external
>>>>>>>> repository.
>>>>>>>>
>>>>>>>>> I was thinking that it might be better to point the people to
>>>>>>>>> satellite
>>>>>>>>> data at the different DAACs so that they can download the files
>>>>>>>>> directly.
>>>>>>>>> That would work for the "obs4MIPs" program.     I would feel better
>>>>>>>>> about
>>>>>>>>> it as well,   I have been worried to be told by some data providers
>>>>>>>>> (ECMWF)
>>>>>>>>> that we are not authorized to distribute their original data.   I
>>>>>>>>> initially
>>>>>>>>> did not think about this when I checked in my original code.
>>>>>>>>>
>>>>>>>>> I just found out that ECMWF now allows people to download their
>>>>>>>>> data
>>>>>>>>> in
>>>>>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before,
>>>>>>>>> but
>>>>>>>>> could
>>>>>>>>> only retrieve GRIB data and did not want to mess with "Grads" ctl
>>>>>>>>> files
>>>>>>>>> and
>>>>>>>>> CDMS2/CDAT package.    So now, I could just create a script to
>>>>>>>>> download
>>>>>>>>> the
>>>>>>>>> right files and rename them to the appropriate filenames for
>>>>>>>>> obs4MIPs
>>>>>>>>> examples.
>>>>>>>>>
>>>>>>>>> I would feel much better about this.   Let me know what you think.
>>>>>>>>>
>>>>>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>>>>>>>> ECMWF+data+servers+in+batch
>>>>>>>>>
>>>>>>>>> Denis
>>>>>>>>>
>>>>>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>>>>>>>
>>>>>>>>>     Hi guys,
>>>>>>>>>
>>>>>>>>>   An unfortunate side effect of our export from SVN to Git is that
>>>>>>>>>
>>>>>>>>>> we've
>>>>>>>>>> ended up with a rather bloated repository. We've had a large
>>>>>>>>>> number
>>>>>>>>>> of
>>>>>>>>>> binary files in our repo in the past and all of this has been
>>>>>>>>>> rolled
>>>>>>>>>> up
>>>>>>>>>> into a obnoxious ~500 MB pack file. I've been completely unable to
>>>>>>>>>> clone
>>>>>>>>>> the repo on my home internet because it constantly times out and
>>>>>>>>>> it's
>>>>>>>>>> painfully slow on my faster work connection.
>>>>>>>>>>
>>>>>>>>>> To fix this problem I suggest we do the following:
>>>>>>>>>> - Remove all binary files from our repo and host them externally.
>>>>>>>>>> For
>>>>>>>>>> example, NetCDF files can be downloaded when they're needed and
>>>>>>>>>> cleaned
>>>>>>>>>> up
>>>>>>>>>> afterwards (for tests or examples).
>>>>>>>>>> - Remove all the bloat from our pack file. I was digging through
>>>>>>>>>> stuff
>>>>>>>>>> earlier and found a number of very large and outdated files in our
>>>>>>>>>> pack
>>>>>>>>>> file (~300 MB NC file, internal JPL presentations/files from a
>>>>>>>>>> long
>>>>>>>>>> time
>>>>>>>>>> ago, etc.). We should be able to use [1] to help automate this for
>>>>>>>>>> us,
>>>>>>>>>> although we can also take care of it on our own if need be.
>>>>>>>>>>
>>>>>>>>>> Let me know what you guys think the best course of action is. That
>>>>>>>>>> being
>>>>>>>>>> said, dealing with this sooner rather than later would be nice =D
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/cmaitchison/git_diet
>>>>>>>>>>
>>>>>>>>>> -- Joyce
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     --
>>>>>>>>>>
>>>>>>>>>>   -----------------------------------------------------
>>>>>>>>>>
>>>>>>>>> Denis Nadeau, (CSC)
>>>>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>>>>> NASA Goddard Space Flight Center
>>>>>>>>> Mailcode 606.2
>>>>>>>>> 8800 Greenbelt Road
>>>>>>>>> Greenbelt, MD 20771
>>>>>>>>> Email: denis.nadeau@nasa.gov
>>>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>>>>> -----------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    --
>>>>>>>>>
>>>>>>>>>  -----------------------------------------------------
>>>>>>>>
>>>>>>> Denis Nadeau, (CSC)
>>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>>> NASA Goddard Space Flight Center
>>>>>>> Mailcode 606.2
>>>>>>> 8800 Greenbelt Road
>>>>>>> Greenbelt, MD 20771
>>>>>>> Email: denis.nadeau@nasa.gov
>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>>> -----------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   --
>>>>>>>
>>>>>> -----------------------------------------------------
>>>>> Denis Nadeau, (CSC)
>>>>> NCCS (NASA Center for Climate Simulation)
>>>>> NASA Goddard Space Flight Center
>>>>> Mailcode 606.2
>>>>> 8800 Greenbelt Road
>>>>> Greenbelt, MD 20771
>>>>> Email: denis.nadeau@nasa.gov
>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>> -----------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>  --
>>> -----------------------------------------------------
>>> Denis Nadeau, (CSC)
>>> NCCS (NASA Center for Climate Simulation)
>>> NASA Goddard Space Flight Center
>>> Mailcode 606.2
>>> 8800 Greenbelt Road
>>> Greenbelt, MD 20771
>>> Email: denis.nadeau@nasa.gov
>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>> -----------------------------------------------------
>>>
>>>
>>>
>
> --
> -----------------------------------------------------
> Denis Nadeau, (CSC)
> NCCS (NASA Center for Climate Simulation)
> NASA Goddard Space Flight Center
> Mailcode 606.2
> 8800 Greenbelt Road
> Greenbelt, MD 20771
> Email: denis.nadeau@nasa.gov
> Phone: (301) 286-7286           Fax: 301.286.1634
> -----------------------------------------------------
>
>

Re: Our new repository is a bit bloated

Posted by "denis.nadeau" <de...@nasa.gov>.
Happy to be your project guinea pig ! :->

After recompiling "git" with "libcurl" (./configure --with-curl) I was 
able to push the changes.  You need curl to get access to https (seems 
like).
Your git remote command was very useful.  I did not have to clone the 
repo and copy my files over and redo the "git commands".

I made quite some changes to obs4MIPs and need to "diff" and push 
changes to the repository.    So it seems that, I am good to go with 
"git". :-)

Great work and thanks for your help!
Denis

On 3/12/14 5:01 PM, Michael Joyce wrote:
> Ah good, we're getting close! We'd be even closer if I hadn't messed up in
> a previous git related email!
>
> Our git://git.apache.org/climate.git mirror is our read only git mirror.
> That would explain why you aren't able to write to it.
>
> We need to use:
> https://git-wip-us.apache.org/repos/asf/climate.git
>
> If we didn't have a commit bit we would instead use (http vs https)
> http://git-wip-us.apache.org/repos/asf/reponame.git
>
> I misread some documentation at [1] and [2] and confused myself. I thought
> the "WIP" or "Work in Progress" label was for migration only. Silly me.
>
> We can fix this fairly easily by running
> $ git remote set-url origin
> https://git-wip-us.apache.org/repos/asf/climate.gi
>
> Then, you should see updated URLs with
> $ git remote -v
>
> At that point you should be able to push successfully.
>
> Sorry that you've turned into our project guinea pig Denis! I had hoped to
> smooth out some of these rough edges this last weekend/early this week but
> unfortunately I haven't been able to do so. We'll get there though!
>
> [1] https://www.apache.org/dev/writable-git
> [2] https://git-wip-us.apache.org/
>
>
> -- Joyce
>
>
> On Wed, Mar 12, 2014 at 12:44 PM, denis.nadeau <de...@nasa.gov>wrote:
>
>> Joyce,
>>
>> This is great introduction and will help other SVN/CVS developers.   (I
>> did not know you had to "git add" every changes.)
>>
>> Right now, I just can't push to github.  I think it might be a
>> configuration issue.  Do you need my ssh keys or something for me to 'push'?
>>
>>     git push origin master
>>
>>         fatal: The remote end hung up unexpectedly
>>
>>
>>     git status
>>
>>         # On branch master
>>         # Your branch is ahead of 'origin/master' by 4 commits.
>>         #
>>         nothing to commit (working directory clean)
>>
>>     git remote -v
>>
>>         origin  git://git.apache.org/climate.git (fetch)
>>         origin  git://git.apache.org/climate.git (push)
>>
>> Thanks for your help. (almost there...)
>> Denis
>>
>>
>> On 3/12/14 12:37 PM, Michael Joyce wrote:
>>
>>> Ah, let me explain since git is just a bit different from SVN.
>>>
>>> When you commit in git you aren't actually committing to the primary
>>> server
>>> like you are in SVN. You're committing to your local working copy. In
>>> order
>>> to mirror those changes to the ASF you will need to run "git push". So
>>> "git
>>> status" is telling you that you've committed 4 times and those changes
>>> aren't mirrored on the server that you ran "git clone" from.
>>>
>>> To be safe, you might want to checkout a clean copy of the repo from the
>>> ASF (which should only take forever =) and then try again. We could go
>>> through each of the commits and make sure they're the way you want them to
>>> be, but that might end up being more trouble than it's worth if we try to
>>> do it via email. This is the workflow that I would probably follow:
>>>
>>> # Remove the files that you don't want anymore. I'm going to say that
>>> we're
>>> # sitting in the root of our repo and the files are in
>>> '/obs4MIPs/examples'
>>> $ git rm -r obs4MIPs/examples
>>> $ git status
>>> # You should now see a number of files being marked as "staged for
>>> commit".
>>> # Go ahead commit these removals
>>> $ git commit -m "Removing obs4MIPs example .nc files"
>>>
>>> # Now if you run git status you shouldn't see any files listed, but it
>>> will
>>> say
>>> # that you're ahead of origin/master by 1 commit
>>>
>>> # Now add the readme/or update any other files
>>>
>>> $ git add .
>>> # It's important to note that "add" in git is not the same as "add" in
>>> svn.
>>> Add in git means
>>> # "add/stage these changes for the next commit". If you're used to svn
>>> this
>>> can be a bit
>>> # confusing. In git you need to add changes every time you want to commit,
>>> as opposed
>>> # to svn where you only "add" the file to the repo once.
>>>
>>> $ git status
>>> # You should see all the files that you changed present and "staged for
>>> commit". When
>>> # something is "staged for commit" that means that it will be committed
>>> next time we
>>> # run git commit.
>>> $ git commit -m "Update blah blah blah"
>>>
>>> # Now you should see that you're ahead by a few commits depending on how
>>> many times you've committed.
>>> # At this point you probably want to share all your changes with everyone,
>>> so we'll push the changes up to the server.
>>>
>>> # You really can abbreviate this to just 'git push' or 'git push origin'.
>>> We're going to play it safe and be super explicit.
>>> # This is telling git to push all the changes that you've committed in
>>> your
>>> 'master' branch
>>> # (which is the default one that you've been working in) to the remote
>>> named "origin". By default,
>>> # the repo that you cloned from is named "origin".
>>> $ git push origin master
>>>
>>> At this point we should get some emails saying you committed.
>>>
>>> Hopefully that helps a bit. If you have more questions let me know. It can
>>> certainly be a bit jarring of a change moving to git from svn. I'm working
>>> on writing up a brief "intro to git" that I will send around to the
>>> mailing
>>> list once it's in a useful state. It should hopefully help clear up some
>>> confusion for everyone.
>>>
>>>
>>> -- Joyce
>>>
>>>
>>> On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <de...@nasa.gov>
>>> wrote:
>>>
>>>   Joyce,
>>>> I did commit the change and also remove "rm" the .nc files.  I did not
>>>> see
>>>> an email either.   Here are the 3 commands I used
>>>>
>>>> 1. git add
>>>> 2. git commit
>>>> 3. git rm
>>>>
>>>>
>>>> I guess "git rm" does not need a commit command.
>>>> When I run "git status" I get this message. I am not sure what "ahead of
>>>> 'origin/master' by 4 commits" means!
>>>>
>>>>      git status
>>>>      # On branch master
>>>>      # Your branch is ahead of 'origin/master' by 4 commits.
>>>>      #
>>>>      nothing to commit (working directory clean)
>>>>
>>>>
>>>> Denis
>>>>
>>>> On 3/12/14 11:19 AM, Michael Joyce wrote:
>>>>
>>>>   Awesome Denis thanks much. I will play around with this more soon and
>>>>> see
>>>>> if I can't strip out some more files. Did you push your changes up to
>>>>> the
>>>>> repo? I didn't see a commit email come through, but I'm not certain my
>>>>> filters are working correctly with the mailing list migrations.
>>>>>
>>>>>
>>>>> -- Joyce
>>>>>
>>>>>
>>>>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <de...@nasa.gov>
>>>>> wrote:
>>>>>
>>>>>    Joyce,
>>>>>
>>>>>> I deleted the .nc files found in my example directory for TRMM and
>>>>>> ECMWF.
>>>>>>     I have installed a README file and explain users how to retrieve the
>>>>>> data
>>>>>> from the original data provider.    TRMM is pretty straightforward, but
>>>>>> for
>>>>>> ECMWF you need to register, obtain a key and download their Python
>>>>>> package.
>>>>>>
>>>>>> It works pretty well on my machine, let see what users say.
>>>>>>
>>>>>> Regards,
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
>>>>>>
>>>>>>    I think that would be great Denis! I can go ahead and look at doing
>>>>>>
>>>>>>> something similar for the other ocw/ocw-ui components as well. I'm
>>>>>>> sure
>>>>>>> this will help us out a good bit.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> -- Joyce
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <denis.nadeau@nasa.gov
>>>>>>>
>>>>>>>   wrote:
>>>>>>>>      Michael,
>>>>>>>   I like the idea of having the NetCDF files in a external repository.
>>>>>>>> I was thinking that it might be better to point the people to
>>>>>>>> satellite
>>>>>>>> data at the different DAACs so that they can download the files
>>>>>>>> directly.
>>>>>>>> That would work for the "obs4MIPs" program.     I would feel better
>>>>>>>> about
>>>>>>>> it as well,   I have been worried to be told by some data providers
>>>>>>>> (ECMWF)
>>>>>>>> that we are not authorized to distribute their original data.   I
>>>>>>>> initially
>>>>>>>> did not think about this when I checked in my original code.
>>>>>>>>
>>>>>>>> I just found out that ECMWF now allows people to download their data
>>>>>>>> in
>>>>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but
>>>>>>>> could
>>>>>>>> only retrieve GRIB data and did not want to mess with "Grads" ctl
>>>>>>>> files
>>>>>>>> and
>>>>>>>> CDMS2/CDAT package.    So now, I could just create a script to
>>>>>>>> download
>>>>>>>> the
>>>>>>>> right files and rename them to the appropriate filenames for obs4MIPs
>>>>>>>> examples.
>>>>>>>>
>>>>>>>> I would feel much better about this.   Let me know what you think.
>>>>>>>>
>>>>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>>>>>>> ECMWF+data+servers+in+batch
>>>>>>>>
>>>>>>>> Denis
>>>>>>>>
>>>>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>>>>>>
>>>>>>>>     Hi guys,
>>>>>>>>
>>>>>>>>   An unfortunate side effect of our export from SVN to Git is that
>>>>>>>>> we've
>>>>>>>>> ended up with a rather bloated repository. We've had a large number
>>>>>>>>> of
>>>>>>>>> binary files in our repo in the past and all of this has been rolled
>>>>>>>>> up
>>>>>>>>> into a obnoxious ~500 MB pack file. I've been completely unable to
>>>>>>>>> clone
>>>>>>>>> the repo on my home internet because it constantly times out and
>>>>>>>>> it's
>>>>>>>>> painfully slow on my faster work connection.
>>>>>>>>>
>>>>>>>>> To fix this problem I suggest we do the following:
>>>>>>>>> - Remove all binary files from our repo and host them externally.
>>>>>>>>> For
>>>>>>>>> example, NetCDF files can be downloaded when they're needed and
>>>>>>>>> cleaned
>>>>>>>>> up
>>>>>>>>> afterwards (for tests or examples).
>>>>>>>>> - Remove all the bloat from our pack file. I was digging through
>>>>>>>>> stuff
>>>>>>>>> earlier and found a number of very large and outdated files in our
>>>>>>>>> pack
>>>>>>>>> file (~300 MB NC file, internal JPL presentations/files from a long
>>>>>>>>> time
>>>>>>>>> ago, etc.). We should be able to use [1] to help automate this for
>>>>>>>>> us,
>>>>>>>>> although we can also take care of it on our own if need be.
>>>>>>>>>
>>>>>>>>> Let me know what you guys think the best course of action is. That
>>>>>>>>> being
>>>>>>>>> said, dealing with this sooner rather than later would be nice =D
>>>>>>>>>
>>>>>>>>> [1] https://github.com/cmaitchison/git_diet
>>>>>>>>>
>>>>>>>>> -- Joyce
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     --
>>>>>>>>>
>>>>>>>>>   -----------------------------------------------------
>>>>>>>> Denis Nadeau, (CSC)
>>>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>>>> NASA Goddard Space Flight Center
>>>>>>>> Mailcode 606.2
>>>>>>>> 8800 Greenbelt Road
>>>>>>>> Greenbelt, MD 20771
>>>>>>>> Email: denis.nadeau@nasa.gov
>>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>>>> -----------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    --
>>>>>>>>
>>>>>>> -----------------------------------------------------
>>>>>> Denis Nadeau, (CSC)
>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>> NASA Goddard Space Flight Center
>>>>>> Mailcode 606.2
>>>>>> 8800 Greenbelt Road
>>>>>> Greenbelt, MD 20771
>>>>>> Email: denis.nadeau@nasa.gov
>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>> -----------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>   --
>>>> -----------------------------------------------------
>>>> Denis Nadeau, (CSC)
>>>> NCCS (NASA Center for Climate Simulation)
>>>> NASA Goddard Space Flight Center
>>>> Mailcode 606.2
>>>> 8800 Greenbelt Road
>>>> Greenbelt, MD 20771
>>>> Email: denis.nadeau@nasa.gov
>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>> --
>> -----------------------------------------------------
>> Denis Nadeau, (CSC)
>> NCCS (NASA Center for Climate Simulation)
>> NASA Goddard Space Flight Center
>> Mailcode 606.2
>> 8800 Greenbelt Road
>> Greenbelt, MD 20771
>> Email: denis.nadeau@nasa.gov
>> Phone: (301) 286-7286           Fax: 301.286.1634
>> -----------------------------------------------------
>>
>>


-- 
-----------------------------------------------------
Denis Nadeau, (CSC)
NCCS (NASA Center for Climate Simulation)
NASA Goddard Space Flight Center
Mailcode 606.2
8800 Greenbelt Road
Greenbelt, MD 20771
Email: denis.nadeau@nasa.gov
Phone: (301) 286-7286           Fax: 301.286.1634
-----------------------------------------------------


Re: Our new repository is a bit bloated

Posted by Michael Joyce <jo...@apache.org>.
Ah good, we're getting close! We'd be even closer if I hadn't messed up in
a previous git related email!

Our git://git.apache.org/climate.git mirror is our read only git mirror.
That would explain why you aren't able to write to it.

We need to use:
https://git-wip-us.apache.org/repos/asf/climate.git

If we didn't have a commit bit we would instead use (http vs https)
http://git-wip-us.apache.org/repos/asf/reponame.git

I misread some documentation at [1] and [2] and confused myself. I thought
the "WIP" or "Work in Progress" label was for migration only. Silly me.

We can fix this fairly easily by running
$ git remote set-url origin
https://git-wip-us.apache.org/repos/asf/climate.gi

Then, you should see updated URLs with
$ git remote -v

At that point you should be able to push successfully.

Sorry that you've turned into our project guinea pig Denis! I had hoped to
smooth out some of these rough edges this last weekend/early this week but
unfortunately I haven't been able to do so. We'll get there though!

[1] https://www.apache.org/dev/writable-git
[2] https://git-wip-us.apache.org/


-- Joyce


On Wed, Mar 12, 2014 at 12:44 PM, denis.nadeau <de...@nasa.gov>wrote:

> Joyce,
>
> This is great introduction and will help other SVN/CVS developers.   (I
> did not know you had to "git add" every changes.)
>
> Right now, I just can't push to github.  I think it might be a
> configuration issue.  Do you need my ssh keys or something for me to 'push'?
>
>    git push origin master
>
>        fatal: The remote end hung up unexpectedly
>
>
>    git status
>
>        # On branch master
>        # Your branch is ahead of 'origin/master' by 4 commits.
>        #
>        nothing to commit (working directory clean)
>
>    git remote -v
>
>        origin  git://git.apache.org/climate.git (fetch)
>        origin  git://git.apache.org/climate.git (push)
>
> Thanks for your help. (almost there...)
> Denis
>
>
> On 3/12/14 12:37 PM, Michael Joyce wrote:
>
>> Ah, let me explain since git is just a bit different from SVN.
>>
>> When you commit in git you aren't actually committing to the primary
>> server
>> like you are in SVN. You're committing to your local working copy. In
>> order
>> to mirror those changes to the ASF you will need to run "git push". So
>> "git
>> status" is telling you that you've committed 4 times and those changes
>> aren't mirrored on the server that you ran "git clone" from.
>>
>> To be safe, you might want to checkout a clean copy of the repo from the
>> ASF (which should only take forever =) and then try again. We could go
>> through each of the commits and make sure they're the way you want them to
>> be, but that might end up being more trouble than it's worth if we try to
>> do it via email. This is the workflow that I would probably follow:
>>
>> # Remove the files that you don't want anymore. I'm going to say that
>> we're
>> # sitting in the root of our repo and the files are in
>> '/obs4MIPs/examples'
>> $ git rm -r obs4MIPs/examples
>> $ git status
>> # You should now see a number of files being marked as "staged for
>> commit".
>> # Go ahead commit these removals
>> $ git commit -m "Removing obs4MIPs example .nc files"
>>
>> # Now if you run git status you shouldn't see any files listed, but it
>> will
>> say
>> # that you're ahead of origin/master by 1 commit
>>
>> # Now add the readme/or update any other files
>>
>> $ git add .
>> # It's important to note that "add" in git is not the same as "add" in
>> svn.
>> Add in git means
>> # "add/stage these changes for the next commit". If you're used to svn
>> this
>> can be a bit
>> # confusing. In git you need to add changes every time you want to commit,
>> as opposed
>> # to svn where you only "add" the file to the repo once.
>>
>> $ git status
>> # You should see all the files that you changed present and "staged for
>> commit". When
>> # something is "staged for commit" that means that it will be committed
>> next time we
>> # run git commit.
>> $ git commit -m "Update blah blah blah"
>>
>> # Now you should see that you're ahead by a few commits depending on how
>> many times you've committed.
>> # At this point you probably want to share all your changes with everyone,
>> so we'll push the changes up to the server.
>>
>> # You really can abbreviate this to just 'git push' or 'git push origin'.
>> We're going to play it safe and be super explicit.
>> # This is telling git to push all the changes that you've committed in
>> your
>> 'master' branch
>> # (which is the default one that you've been working in) to the remote
>> named "origin". By default,
>> # the repo that you cloned from is named "origin".
>> $ git push origin master
>>
>> At this point we should get some emails saying you committed.
>>
>> Hopefully that helps a bit. If you have more questions let me know. It can
>> certainly be a bit jarring of a change moving to git from svn. I'm working
>> on writing up a brief "intro to git" that I will send around to the
>> mailing
>> list once it's in a useful state. It should hopefully help clear up some
>> confusion for everyone.
>>
>>
>> -- Joyce
>>
>>
>> On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <de...@nasa.gov>
>> wrote:
>>
>>  Joyce,
>>>
>>> I did commit the change and also remove "rm" the .nc files.  I did not
>>> see
>>> an email either.   Here are the 3 commands I used
>>>
>>> 1. git add
>>> 2. git commit
>>> 3. git rm
>>>
>>>
>>> I guess "git rm" does not need a commit command.
>>> When I run "git status" I get this message. I am not sure what "ahead of
>>> 'origin/master' by 4 commits" means!
>>>
>>>     git status
>>>     # On branch master
>>>     # Your branch is ahead of 'origin/master' by 4 commits.
>>>     #
>>>     nothing to commit (working directory clean)
>>>
>>>
>>> Denis
>>>
>>> On 3/12/14 11:19 AM, Michael Joyce wrote:
>>>
>>>  Awesome Denis thanks much. I will play around with this more soon and
>>>> see
>>>> if I can't strip out some more files. Did you push your changes up to
>>>> the
>>>> repo? I didn't see a commit email come through, but I'm not certain my
>>>> filters are working correctly with the mailing list migrations.
>>>>
>>>>
>>>> -- Joyce
>>>>
>>>>
>>>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <de...@nasa.gov>
>>>> wrote:
>>>>
>>>>   Joyce,
>>>>
>>>>> I deleted the .nc files found in my example directory for TRMM and
>>>>> ECMWF.
>>>>>    I have installed a README file and explain users how to retrieve the
>>>>> data
>>>>> from the original data provider.    TRMM is pretty straightforward, but
>>>>> for
>>>>> ECMWF you need to register, obtain a key and download their Python
>>>>> package.
>>>>>
>>>>> It works pretty well on my machine, let see what users say.
>>>>>
>>>>> Regards,
>>>>> Denis
>>>>>
>>>>>
>>>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
>>>>>
>>>>>   I think that would be great Denis! I can go ahead and look at doing
>>>>>
>>>>>> something similar for the other ocw/ocw-ui components as well. I'm
>>>>>> sure
>>>>>> this will help us out a good bit.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> -- Joyce
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <denis.nadeau@nasa.gov
>>>>>>
>>>>>>  wrote:
>>>>>>>
>>>>>>>     Michael,
>>>>>>
>>>>>>  I like the idea of having the NetCDF files in a external repository.
>>>>>>>
>>>>>>> I was thinking that it might be better to point the people to
>>>>>>> satellite
>>>>>>> data at the different DAACs so that they can download the files
>>>>>>> directly.
>>>>>>> That would work for the "obs4MIPs" program.     I would feel better
>>>>>>> about
>>>>>>> it as well,   I have been worried to be told by some data providers
>>>>>>> (ECMWF)
>>>>>>> that we are not authorized to distribute their original data.   I
>>>>>>> initially
>>>>>>> did not think about this when I checked in my original code.
>>>>>>>
>>>>>>> I just found out that ECMWF now allows people to download their data
>>>>>>> in
>>>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but
>>>>>>> could
>>>>>>> only retrieve GRIB data and did not want to mess with "Grads" ctl
>>>>>>> files
>>>>>>> and
>>>>>>> CDMS2/CDAT package.    So now, I could just create a script to
>>>>>>> download
>>>>>>> the
>>>>>>> right files and rename them to the appropriate filenames for obs4MIPs
>>>>>>> examples.
>>>>>>>
>>>>>>> I would feel much better about this.   Let me know what you think.
>>>>>>>
>>>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>>>>>> ECMWF+data+servers+in+batch
>>>>>>>
>>>>>>> Denis
>>>>>>>
>>>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>>>>>
>>>>>>>    Hi guys,
>>>>>>>
>>>>>>>  An unfortunate side effect of our export from SVN to Git is that
>>>>>>>> we've
>>>>>>>> ended up with a rather bloated repository. We've had a large number
>>>>>>>> of
>>>>>>>> binary files in our repo in the past and all of this has been rolled
>>>>>>>> up
>>>>>>>> into a obnoxious ~500 MB pack file. I've been completely unable to
>>>>>>>> clone
>>>>>>>> the repo on my home internet because it constantly times out and
>>>>>>>> it's
>>>>>>>> painfully slow on my faster work connection.
>>>>>>>>
>>>>>>>> To fix this problem I suggest we do the following:
>>>>>>>> - Remove all binary files from our repo and host them externally.
>>>>>>>> For
>>>>>>>> example, NetCDF files can be downloaded when they're needed and
>>>>>>>> cleaned
>>>>>>>> up
>>>>>>>> afterwards (for tests or examples).
>>>>>>>> - Remove all the bloat from our pack file. I was digging through
>>>>>>>> stuff
>>>>>>>> earlier and found a number of very large and outdated files in our
>>>>>>>> pack
>>>>>>>> file (~300 MB NC file, internal JPL presentations/files from a long
>>>>>>>> time
>>>>>>>> ago, etc.). We should be able to use [1] to help automate this for
>>>>>>>> us,
>>>>>>>> although we can also take care of it on our own if need be.
>>>>>>>>
>>>>>>>> Let me know what you guys think the best course of action is. That
>>>>>>>> being
>>>>>>>> said, dealing with this sooner rather than later would be nice =D
>>>>>>>>
>>>>>>>> [1] https://github.com/cmaitchison/git_diet
>>>>>>>>
>>>>>>>> -- Joyce
>>>>>>>>
>>>>>>>>
>>>>>>>>    --
>>>>>>>>
>>>>>>>>  -----------------------------------------------------
>>>>>>> Denis Nadeau, (CSC)
>>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>>> NASA Goddard Space Flight Center
>>>>>>> Mailcode 606.2
>>>>>>> 8800 Greenbelt Road
>>>>>>> Greenbelt, MD 20771
>>>>>>> Email: denis.nadeau@nasa.gov
>>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>>> -----------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   --
>>>>>>>
>>>>>> -----------------------------------------------------
>>>>> Denis Nadeau, (CSC)
>>>>> NCCS (NASA Center for Climate Simulation)
>>>>> NASA Goddard Space Flight Center
>>>>> Mailcode 606.2
>>>>> 8800 Greenbelt Road
>>>>> Greenbelt, MD 20771
>>>>> Email: denis.nadeau@nasa.gov
>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>> -----------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>  --
>>> -----------------------------------------------------
>>> Denis Nadeau, (CSC)
>>> NCCS (NASA Center for Climate Simulation)
>>> NASA Goddard Space Flight Center
>>> Mailcode 606.2
>>> 8800 Greenbelt Road
>>> Greenbelt, MD 20771
>>> Email: denis.nadeau@nasa.gov
>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>> -----------------------------------------------------
>>>
>>>
>>>
>
> --
> -----------------------------------------------------
> Denis Nadeau, (CSC)
> NCCS (NASA Center for Climate Simulation)
> NASA Goddard Space Flight Center
> Mailcode 606.2
> 8800 Greenbelt Road
> Greenbelt, MD 20771
> Email: denis.nadeau@nasa.gov
> Phone: (301) 286-7286           Fax: 301.286.1634
> -----------------------------------------------------
>
>

Re: Our new repository is a bit bloated

Posted by "denis.nadeau" <de...@nasa.gov>.
Joyce,

This is great introduction and will help other SVN/CVS developers.   (I 
did not know you had to "git add" every changes.)

Right now, I just can't push to github.  I think it might be a 
configuration issue.  Do you need my ssh keys or something for me to 'push'?

    git push origin master

        fatal: The remote end hung up unexpectedly

    git status

        # On branch master
        # Your branch is ahead of 'origin/master' by 4 commits.
        #
        nothing to commit (working directory clean)

    git remote -v

        origin  git://git.apache.org/climate.git (fetch)
        origin  git://git.apache.org/climate.git (push)

Thanks for your help. (almost there...)
Denis

On 3/12/14 12:37 PM, Michael Joyce wrote:
> Ah, let me explain since git is just a bit different from SVN.
>
> When you commit in git you aren't actually committing to the primary server
> like you are in SVN. You're committing to your local working copy. In order
> to mirror those changes to the ASF you will need to run "git push". So "git
> status" is telling you that you've committed 4 times and those changes
> aren't mirrored on the server that you ran "git clone" from.
>
> To be safe, you might want to checkout a clean copy of the repo from the
> ASF (which should only take forever =) and then try again. We could go
> through each of the commits and make sure they're the way you want them to
> be, but that might end up being more trouble than it's worth if we try to
> do it via email. This is the workflow that I would probably follow:
>
> # Remove the files that you don't want anymore. I'm going to say that we're
> # sitting in the root of our repo and the files are in '/obs4MIPs/examples'
> $ git rm -r obs4MIPs/examples
> $ git status
> # You should now see a number of files being marked as "staged for commit".
> # Go ahead commit these removals
> $ git commit -m "Removing obs4MIPs example .nc files"
>
> # Now if you run git status you shouldn't see any files listed, but it will
> say
> # that you're ahead of origin/master by 1 commit
>
> # Now add the readme/or update any other files
>
> $ git add .
> # It's important to note that "add" in git is not the same as "add" in svn.
> Add in git means
> # "add/stage these changes for the next commit". If you're used to svn this
> can be a bit
> # confusing. In git you need to add changes every time you want to commit,
> as opposed
> # to svn where you only "add" the file to the repo once.
>
> $ git status
> # You should see all the files that you changed present and "staged for
> commit". When
> # something is "staged for commit" that means that it will be committed
> next time we
> # run git commit.
> $ git commit -m "Update blah blah blah"
>
> # Now you should see that you're ahead by a few commits depending on how
> many times you've committed.
> # At this point you probably want to share all your changes with everyone,
> so we'll push the changes up to the server.
>
> # You really can abbreviate this to just 'git push' or 'git push origin'.
> We're going to play it safe and be super explicit.
> # This is telling git to push all the changes that you've committed in your
> 'master' branch
> # (which is the default one that you've been working in) to the remote
> named "origin". By default,
> # the repo that you cloned from is named "origin".
> $ git push origin master
>
> At this point we should get some emails saying you committed.
>
> Hopefully that helps a bit. If you have more questions let me know. It can
> certainly be a bit jarring of a change moving to git from svn. I'm working
> on writing up a brief "intro to git" that I will send around to the mailing
> list once it's in a useful state. It should hopefully help clear up some
> confusion for everyone.
>
>
> -- Joyce
>
>
> On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <de...@nasa.gov> wrote:
>
>> Joyce,
>>
>> I did commit the change and also remove "rm" the .nc files.  I did not see
>> an email either.   Here are the 3 commands I used
>>
>> 1. git add
>> 2. git commit
>> 3. git rm
>>
>>
>> I guess "git rm" does not need a commit command.
>> When I run "git status" I get this message. I am not sure what "ahead of
>> 'origin/master' by 4 commits" means!
>>
>>     git status
>>     # On branch master
>>     # Your branch is ahead of 'origin/master' by 4 commits.
>>     #
>>     nothing to commit (working directory clean)
>>
>>
>> Denis
>>
>> On 3/12/14 11:19 AM, Michael Joyce wrote:
>>
>>> Awesome Denis thanks much. I will play around with this more soon and see
>>> if I can't strip out some more files. Did you push your changes up to the
>>> repo? I didn't see a commit email come through, but I'm not certain my
>>> filters are working correctly with the mailing list migrations.
>>>
>>>
>>> -- Joyce
>>>
>>>
>>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <de...@nasa.gov>
>>> wrote:
>>>
>>>   Joyce,
>>>> I deleted the .nc files found in my example directory for TRMM and ECMWF.
>>>>    I have installed a README file and explain users how to retrieve the
>>>> data
>>>> from the original data provider.    TRMM is pretty straightforward, but
>>>> for
>>>> ECMWF you need to register, obtain a key and download their Python
>>>> package.
>>>>
>>>> It works pretty well on my machine, let see what users say.
>>>>
>>>> Regards,
>>>> Denis
>>>>
>>>>
>>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
>>>>
>>>>   I think that would be great Denis! I can go ahead and look at doing
>>>>> something similar for the other ocw/ocw-ui components as well. I'm sure
>>>>> this will help us out a good bit.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> -- Joyce
>>>>>
>>>>>
>>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <denis.nadeau@nasa.gov
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>    Michael,
>>>>>
>>>>>> I like the idea of having the NetCDF files in a external repository.
>>>>>>
>>>>>> I was thinking that it might be better to point the people to satellite
>>>>>> data at the different DAACs so that they can download the files
>>>>>> directly.
>>>>>> That would work for the "obs4MIPs" program.     I would feel better
>>>>>> about
>>>>>> it as well,   I have been worried to be told by some data providers
>>>>>> (ECMWF)
>>>>>> that we are not authorized to distribute their original data.   I
>>>>>> initially
>>>>>> did not think about this when I checked in my original code.
>>>>>>
>>>>>> I just found out that ECMWF now allows people to download their data in
>>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but
>>>>>> could
>>>>>> only retrieve GRIB data and did not want to mess with "Grads" ctl files
>>>>>> and
>>>>>> CDMS2/CDAT package.    So now, I could just create a script to download
>>>>>> the
>>>>>> right files and rename them to the appropriate filenames for obs4MIPs
>>>>>> examples.
>>>>>>
>>>>>> I would feel much better about this.   Let me know what you think.
>>>>>>
>>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>>>>> ECMWF+data+servers+in+batch
>>>>>>
>>>>>> Denis
>>>>>>
>>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>>>>
>>>>>>    Hi guys,
>>>>>>
>>>>>>> An unfortunate side effect of our export from SVN to Git is that we've
>>>>>>> ended up with a rather bloated repository. We've had a large number of
>>>>>>> binary files in our repo in the past and all of this has been rolled
>>>>>>> up
>>>>>>> into a obnoxious ~500 MB pack file. I've been completely unable to
>>>>>>> clone
>>>>>>> the repo on my home internet because it constantly times out and it's
>>>>>>> painfully slow on my faster work connection.
>>>>>>>
>>>>>>> To fix this problem I suggest we do the following:
>>>>>>> - Remove all binary files from our repo and host them externally. For
>>>>>>> example, NetCDF files can be downloaded when they're needed and
>>>>>>> cleaned
>>>>>>> up
>>>>>>> afterwards (for tests or examples).
>>>>>>> - Remove all the bloat from our pack file. I was digging through stuff
>>>>>>> earlier and found a number of very large and outdated files in our
>>>>>>> pack
>>>>>>> file (~300 MB NC file, internal JPL presentations/files from a long
>>>>>>> time
>>>>>>> ago, etc.). We should be able to use [1] to help automate this for us,
>>>>>>> although we can also take care of it on our own if need be.
>>>>>>>
>>>>>>> Let me know what you guys think the best course of action is. That
>>>>>>> being
>>>>>>> said, dealing with this sooner rather than later would be nice =D
>>>>>>>
>>>>>>> [1] https://github.com/cmaitchison/git_diet
>>>>>>>
>>>>>>> -- Joyce
>>>>>>>
>>>>>>>
>>>>>>>    --
>>>>>>>
>>>>>> -----------------------------------------------------
>>>>>> Denis Nadeau, (CSC)
>>>>>> NCCS (NASA Center for Climate Simulation)
>>>>>> NASA Goddard Space Flight Center
>>>>>> Mailcode 606.2
>>>>>> 8800 Greenbelt Road
>>>>>> Greenbelt, MD 20771
>>>>>> Email: denis.nadeau@nasa.gov
>>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>>> -----------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>   --
>>>> -----------------------------------------------------
>>>> Denis Nadeau, (CSC)
>>>> NCCS (NASA Center for Climate Simulation)
>>>> NASA Goddard Space Flight Center
>>>> Mailcode 606.2
>>>> 8800 Greenbelt Road
>>>> Greenbelt, MD 20771
>>>> Email: denis.nadeau@nasa.gov
>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>> --
>> -----------------------------------------------------
>> Denis Nadeau, (CSC)
>> NCCS (NASA Center for Climate Simulation)
>> NASA Goddard Space Flight Center
>> Mailcode 606.2
>> 8800 Greenbelt Road
>> Greenbelt, MD 20771
>> Email: denis.nadeau@nasa.gov
>> Phone: (301) 286-7286           Fax: 301.286.1634
>> -----------------------------------------------------
>>
>>


-- 
-----------------------------------------------------
Denis Nadeau, (CSC)
NCCS (NASA Center for Climate Simulation)
NASA Goddard Space Flight Center
Mailcode 606.2
8800 Greenbelt Road
Greenbelt, MD 20771
Email: denis.nadeau@nasa.gov
Phone: (301) 286-7286           Fax: 301.286.1634
-----------------------------------------------------


Re: Our new repository is a bit bloated

Posted by Michael Joyce <jo...@apache.org>.
Ah, let me explain since git is just a bit different from SVN.

When you commit in git you aren't actually committing to the primary server
like you are in SVN. You're committing to your local working copy. In order
to mirror those changes to the ASF you will need to run "git push". So "git
status" is telling you that you've committed 4 times and those changes
aren't mirrored on the server that you ran "git clone" from.

To be safe, you might want to checkout a clean copy of the repo from the
ASF (which should only take forever =) and then try again. We could go
through each of the commits and make sure they're the way you want them to
be, but that might end up being more trouble than it's worth if we try to
do it via email. This is the workflow that I would probably follow:

# Remove the files that you don't want anymore. I'm going to say that we're
# sitting in the root of our repo and the files are in '/obs4MIPs/examples'
$ git rm -r obs4MIPs/examples
$ git status
# You should now see a number of files being marked as "staged for commit".
# Go ahead commit these removals
$ git commit -m "Removing obs4MIPs example .nc files"

# Now if you run git status you shouldn't see any files listed, but it will
say
# that you're ahead of origin/master by 1 commit

# Now add the readme/or update any other files

$ git add .
# It's important to note that "add" in git is not the same as "add" in svn.
Add in git means
# "add/stage these changes for the next commit". If you're used to svn this
can be a bit
# confusing. In git you need to add changes every time you want to commit,
as opposed
# to svn where you only "add" the file to the repo once.

$ git status
# You should see all the files that you changed present and "staged for
commit". When
# something is "staged for commit" that means that it will be committed
next time we
# run git commit.
$ git commit -m "Update blah blah blah"

# Now you should see that you're ahead by a few commits depending on how
many times you've committed.
# At this point you probably want to share all your changes with everyone,
so we'll push the changes up to the server.

# You really can abbreviate this to just 'git push' or 'git push origin'.
We're going to play it safe and be super explicit.
# This is telling git to push all the changes that you've committed in your
'master' branch
# (which is the default one that you've been working in) to the remote
named "origin". By default,
# the repo that you cloned from is named "origin".
$ git push origin master

At this point we should get some emails saying you committed.

Hopefully that helps a bit. If you have more questions let me know. It can
certainly be a bit jarring of a change moving to git from svn. I'm working
on writing up a brief "intro to git" that I will send around to the mailing
list once it's in a useful state. It should hopefully help clear up some
confusion for everyone.


-- Joyce


On Wed, Mar 12, 2014 at 9:04 AM, denis.nadeau <de...@nasa.gov> wrote:

> Joyce,
>
> I did commit the change and also remove "rm" the .nc files.  I did not see
> an email either.   Here are the 3 commands I used
>
> 1. git add
> 2. git commit
> 3. git rm
>
>
> I guess "git rm" does not need a commit command.
> When I run "git status" I get this message. I am not sure what "ahead of
> 'origin/master' by 4 commits" means!
>
>    git status
>    # On branch master
>    # Your branch is ahead of 'origin/master' by 4 commits.
>    #
>    nothing to commit (working directory clean)
>
>
> Denis
>
> On 3/12/14 11:19 AM, Michael Joyce wrote:
>
>> Awesome Denis thanks much. I will play around with this more soon and see
>> if I can't strip out some more files. Did you push your changes up to the
>> repo? I didn't see a commit email come through, but I'm not certain my
>> filters are working correctly with the mailing list migrations.
>>
>>
>> -- Joyce
>>
>>
>> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <de...@nasa.gov>
>> wrote:
>>
>>  Joyce,
>>>
>>> I deleted the .nc files found in my example directory for TRMM and ECMWF.
>>>   I have installed a README file and explain users how to retrieve the
>>> data
>>> from the original data provider.    TRMM is pretty straightforward, but
>>> for
>>> ECMWF you need to register, obtain a key and download their Python
>>> package.
>>>
>>> It works pretty well on my machine, let see what users say.
>>>
>>> Regards,
>>> Denis
>>>
>>>
>>> On 3/10/14 3:53 PM, Michael Joyce wrote:
>>>
>>>  I think that would be great Denis! I can go ahead and look at doing
>>>> something similar for the other ocw/ocw-ui components as well. I'm sure
>>>> this will help us out a good bit.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> -- Joyce
>>>>
>>>>
>>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <denis.nadeau@nasa.gov
>>>>
>>>>> wrote:
>>>>>
>>>>   Michael,
>>>>
>>>>> I like the idea of having the NetCDF files in a external repository.
>>>>>
>>>>> I was thinking that it might be better to point the people to satellite
>>>>> data at the different DAACs so that they can download the files
>>>>> directly.
>>>>> That would work for the "obs4MIPs" program.     I would feel better
>>>>> about
>>>>> it as well,   I have been worried to be told by some data providers
>>>>> (ECMWF)
>>>>> that we are not authorized to distribute their original data.   I
>>>>> initially
>>>>> did not think about this when I checked in my original code.
>>>>>
>>>>> I just found out that ECMWF now allows people to download their data in
>>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but
>>>>> could
>>>>> only retrieve GRIB data and did not want to mess with "Grads" ctl files
>>>>> and
>>>>> CDMS2/CDAT package.    So now, I could just create a script to download
>>>>> the
>>>>> right files and rename them to the appropriate filenames for obs4MIPs
>>>>> examples.
>>>>>
>>>>> I would feel much better about this.   Let me know what you think.
>>>>>
>>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>>>> ECMWF+data+servers+in+batch
>>>>>
>>>>> Denis
>>>>>
>>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>>>
>>>>>   Hi guys,
>>>>>
>>>>>> An unfortunate side effect of our export from SVN to Git is that we've
>>>>>> ended up with a rather bloated repository. We've had a large number of
>>>>>> binary files in our repo in the past and all of this has been rolled
>>>>>> up
>>>>>> into a obnoxious ~500 MB pack file. I've been completely unable to
>>>>>> clone
>>>>>> the repo on my home internet because it constantly times out and it's
>>>>>> painfully slow on my faster work connection.
>>>>>>
>>>>>> To fix this problem I suggest we do the following:
>>>>>> - Remove all binary files from our repo and host them externally. For
>>>>>> example, NetCDF files can be downloaded when they're needed and
>>>>>> cleaned
>>>>>> up
>>>>>> afterwards (for tests or examples).
>>>>>> - Remove all the bloat from our pack file. I was digging through stuff
>>>>>> earlier and found a number of very large and outdated files in our
>>>>>> pack
>>>>>> file (~300 MB NC file, internal JPL presentations/files from a long
>>>>>> time
>>>>>> ago, etc.). We should be able to use [1] to help automate this for us,
>>>>>> although we can also take care of it on our own if need be.
>>>>>>
>>>>>> Let me know what you guys think the best course of action is. That
>>>>>> being
>>>>>> said, dealing with this sooner rather than later would be nice =D
>>>>>>
>>>>>> [1] https://github.com/cmaitchison/git_diet
>>>>>>
>>>>>> -- Joyce
>>>>>>
>>>>>>
>>>>>>   --
>>>>>>
>>>>> -----------------------------------------------------
>>>>> Denis Nadeau, (CSC)
>>>>> NCCS (NASA Center for Climate Simulation)
>>>>> NASA Goddard Space Flight Center
>>>>> Mailcode 606.2
>>>>> 8800 Greenbelt Road
>>>>> Greenbelt, MD 20771
>>>>> Email: denis.nadeau@nasa.gov
>>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>>> -----------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>  --
>>> -----------------------------------------------------
>>> Denis Nadeau, (CSC)
>>> NCCS (NASA Center for Climate Simulation)
>>> NASA Goddard Space Flight Center
>>> Mailcode 606.2
>>> 8800 Greenbelt Road
>>> Greenbelt, MD 20771
>>> Email: denis.nadeau@nasa.gov
>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>> -----------------------------------------------------
>>>
>>>
>>>
>
> --
> -----------------------------------------------------
> Denis Nadeau, (CSC)
> NCCS (NASA Center for Climate Simulation)
> NASA Goddard Space Flight Center
> Mailcode 606.2
> 8800 Greenbelt Road
> Greenbelt, MD 20771
> Email: denis.nadeau@nasa.gov
> Phone: (301) 286-7286           Fax: 301.286.1634
> -----------------------------------------------------
>
>

Re: Our new repository is a bit bloated

Posted by "denis.nadeau" <de...@nasa.gov>.
Joyce,

I did commit the change and also remove "rm" the .nc files.  I did not 
see an email either.   Here are the 3 commands I used

 1. git add
 2. git commit
 3. git rm


I guess "git rm" does not need a commit command.
When I run "git status" I get this message. I am not sure what "ahead of 
'origin/master' by 4 commits" means!

    git status
    # On branch master
    # Your branch is ahead of 'origin/master' by 4 commits.
    #
    nothing to commit (working directory clean)


Denis
On 3/12/14 11:19 AM, Michael Joyce wrote:
> Awesome Denis thanks much. I will play around with this more soon and see
> if I can't strip out some more files. Did you push your changes up to the
> repo? I didn't see a commit email come through, but I'm not certain my
> filters are working correctly with the mailing list migrations.
>
>
> -- Joyce
>
>
> On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <de...@nasa.gov> wrote:
>
>> Joyce,
>>
>> I deleted the .nc files found in my example directory for TRMM and ECMWF.
>>   I have installed a README file and explain users how to retrieve the data
>> from the original data provider.    TRMM is pretty straightforward, but for
>> ECMWF you need to register, obtain a key and download their Python package.
>>
>> It works pretty well on my machine, let see what users say.
>>
>> Regards,
>> Denis
>>
>>
>> On 3/10/14 3:53 PM, Michael Joyce wrote:
>>
>>> I think that would be great Denis! I can go ahead and look at doing
>>> something similar for the other ocw/ocw-ui components as well. I'm sure
>>> this will help us out a good bit.
>>>
>>> Thanks!
>>>
>>>
>>> -- Joyce
>>>
>>>
>>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <denis.nadeau@nasa.gov
>>>> wrote:
>>>   Michael,
>>>> I like the idea of having the NetCDF files in a external repository.
>>>>
>>>> I was thinking that it might be better to point the people to satellite
>>>> data at the different DAACs so that they can download the files directly.
>>>> That would work for the "obs4MIPs" program.     I would feel better about
>>>> it as well,   I have been worried to be told by some data providers
>>>> (ECMWF)
>>>> that we are not authorized to distribute their original data.   I
>>>> initially
>>>> did not think about this when I checked in my original code.
>>>>
>>>> I just found out that ECMWF now allows people to download their data in
>>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but
>>>> could
>>>> only retrieve GRIB data and did not want to mess with "Grads" ctl files
>>>> and
>>>> CDMS2/CDAT package.    So now, I could just create a script to download
>>>> the
>>>> right files and rename them to the appropriate filenames for obs4MIPs
>>>> examples.
>>>>
>>>> I would feel much better about this.   Let me know what you think.
>>>>
>>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>>> ECMWF+data+servers+in+batch
>>>>
>>>> Denis
>>>>
>>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>>
>>>>   Hi guys,
>>>>> An unfortunate side effect of our export from SVN to Git is that we've
>>>>> ended up with a rather bloated repository. We've had a large number of
>>>>> binary files in our repo in the past and all of this has been rolled up
>>>>> into a obnoxious ~500 MB pack file. I've been completely unable to clone
>>>>> the repo on my home internet because it constantly times out and it's
>>>>> painfully slow on my faster work connection.
>>>>>
>>>>> To fix this problem I suggest we do the following:
>>>>> - Remove all binary files from our repo and host them externally. For
>>>>> example, NetCDF files can be downloaded when they're needed and cleaned
>>>>> up
>>>>> afterwards (for tests or examples).
>>>>> - Remove all the bloat from our pack file. I was digging through stuff
>>>>> earlier and found a number of very large and outdated files in our pack
>>>>> file (~300 MB NC file, internal JPL presentations/files from a long time
>>>>> ago, etc.). We should be able to use [1] to help automate this for us,
>>>>> although we can also take care of it on our own if need be.
>>>>>
>>>>> Let me know what you guys think the best course of action is. That being
>>>>> said, dealing with this sooner rather than later would be nice =D
>>>>>
>>>>> [1] https://github.com/cmaitchison/git_diet
>>>>>
>>>>> -- Joyce
>>>>>
>>>>>
>>>>>   --
>>>> -----------------------------------------------------
>>>> Denis Nadeau, (CSC)
>>>> NCCS (NASA Center for Climate Simulation)
>>>> NASA Goddard Space Flight Center
>>>> Mailcode 606.2
>>>> 8800 Greenbelt Road
>>>> Greenbelt, MD 20771
>>>> Email: denis.nadeau@nasa.gov
>>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>>> -----------------------------------------------------
>>>>
>>>>
>>>>
>> --
>> -----------------------------------------------------
>> Denis Nadeau, (CSC)
>> NCCS (NASA Center for Climate Simulation)
>> NASA Goddard Space Flight Center
>> Mailcode 606.2
>> 8800 Greenbelt Road
>> Greenbelt, MD 20771
>> Email: denis.nadeau@nasa.gov
>> Phone: (301) 286-7286           Fax: 301.286.1634
>> -----------------------------------------------------
>>
>>


-- 
-----------------------------------------------------
Denis Nadeau, (CSC)
NCCS (NASA Center for Climate Simulation)
NASA Goddard Space Flight Center
Mailcode 606.2
8800 Greenbelt Road
Greenbelt, MD 20771
Email: denis.nadeau@nasa.gov
Phone: (301) 286-7286           Fax: 301.286.1634
-----------------------------------------------------


Re: Our new repository is a bit bloated

Posted by Michael Joyce <jo...@apache.org>.
Awesome Denis thanks much. I will play around with this more soon and see
if I can't strip out some more files. Did you push your changes up to the
repo? I didn't see a commit email come through, but I'm not certain my
filters are working correctly with the mailing list migrations.


-- Joyce


On Wed, Mar 12, 2014 at 7:20 AM, denis.nadeau <de...@nasa.gov> wrote:

> Joyce,
>
> I deleted the .nc files found in my example directory for TRMM and ECMWF.
>  I have installed a README file and explain users how to retrieve the data
> from the original data provider.    TRMM is pretty straightforward, but for
> ECMWF you need to register, obtain a key and download their Python package.
>
> It works pretty well on my machine, let see what users say.
>
> Regards,
> Denis
>
>
> On 3/10/14 3:53 PM, Michael Joyce wrote:
>
>> I think that would be great Denis! I can go ahead and look at doing
>> something similar for the other ocw/ocw-ui components as well. I'm sure
>> this will help us out a good bit.
>>
>> Thanks!
>>
>>
>> -- Joyce
>>
>>
>> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <denis.nadeau@nasa.gov
>> >wrote:
>>
>>  Michael,
>>>
>>> I like the idea of having the NetCDF files in a external repository.
>>>
>>> I was thinking that it might be better to point the people to satellite
>>> data at the different DAACs so that they can download the files directly.
>>> That would work for the "obs4MIPs" program.     I would feel better about
>>> it as well,   I have been worried to be told by some data providers
>>> (ECMWF)
>>> that we are not authorized to distribute their original data.   I
>>> initially
>>> did not think about this when I checked in my original code.
>>>
>>> I just found out that ECMWF now allows people to download their data in
>>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but
>>> could
>>> only retrieve GRIB data and did not want to mess with "Grads" ctl files
>>> and
>>> CDMS2/CDAT package.    So now, I could just create a script to download
>>> the
>>> right files and rename them to the appropriate filenames for obs4MIPs
>>> examples.
>>>
>>> I would feel much better about this.   Let me know what you think.
>>>
>>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>>> ECMWF+data+servers+in+batch
>>>
>>> Denis
>>>
>>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>>
>>>  Hi guys,
>>>>
>>>> An unfortunate side effect of our export from SVN to Git is that we've
>>>> ended up with a rather bloated repository. We've had a large number of
>>>> binary files in our repo in the past and all of this has been rolled up
>>>> into a obnoxious ~500 MB pack file. I've been completely unable to clone
>>>> the repo on my home internet because it constantly times out and it's
>>>> painfully slow on my faster work connection.
>>>>
>>>> To fix this problem I suggest we do the following:
>>>> - Remove all binary files from our repo and host them externally. For
>>>> example, NetCDF files can be downloaded when they're needed and cleaned
>>>> up
>>>> afterwards (for tests or examples).
>>>> - Remove all the bloat from our pack file. I was digging through stuff
>>>> earlier and found a number of very large and outdated files in our pack
>>>> file (~300 MB NC file, internal JPL presentations/files from a long time
>>>> ago, etc.). We should be able to use [1] to help automate this for us,
>>>> although we can also take care of it on our own if need be.
>>>>
>>>> Let me know what you guys think the best course of action is. That being
>>>> said, dealing with this sooner rather than later would be nice =D
>>>>
>>>> [1] https://github.com/cmaitchison/git_diet
>>>>
>>>> -- Joyce
>>>>
>>>>
>>>>  --
>>> -----------------------------------------------------
>>> Denis Nadeau, (CSC)
>>> NCCS (NASA Center for Climate Simulation)
>>> NASA Goddard Space Flight Center
>>> Mailcode 606.2
>>> 8800 Greenbelt Road
>>> Greenbelt, MD 20771
>>> Email: denis.nadeau@nasa.gov
>>> Phone: (301) 286-7286           Fax: 301.286.1634
>>> -----------------------------------------------------
>>>
>>>
>>>
>
> --
> -----------------------------------------------------
> Denis Nadeau, (CSC)
> NCCS (NASA Center for Climate Simulation)
> NASA Goddard Space Flight Center
> Mailcode 606.2
> 8800 Greenbelt Road
> Greenbelt, MD 20771
> Email: denis.nadeau@nasa.gov
> Phone: (301) 286-7286           Fax: 301.286.1634
> -----------------------------------------------------
>
>

Re: Our new repository is a bit bloated

Posted by "denis.nadeau" <de...@nasa.gov>.
Joyce,

I deleted the .nc files found in my example directory for TRMM and 
ECMWF.  I have installed a README file and explain users how to retrieve 
the data from the original data provider.    TRMM is pretty 
straightforward, but for ECMWF you need to register, obtain a key and 
download their Python package.

It works pretty well on my machine, let see what users say.

Regards,
Denis

On 3/10/14 3:53 PM, Michael Joyce wrote:
> I think that would be great Denis! I can go ahead and look at doing
> something similar for the other ocw/ocw-ui components as well. I'm sure
> this will help us out a good bit.
>
> Thanks!
>
>
> -- Joyce
>
>
> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <de...@nasa.gov>wrote:
>
>> Michael,
>>
>> I like the idea of having the NetCDF files in a external repository.
>>
>> I was thinking that it might be better to point the people to satellite
>> data at the different DAACs so that they can download the files directly.
>> That would work for the "obs4MIPs" program.     I would feel better about
>> it as well,   I have been worried to be told by some data providers (ECMWF)
>> that we are not authorized to distribute their original data.   I initially
>> did not think about this when I checked in my original code.
>>
>> I just found out that ECMWF now allows people to download their data in
>> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but could
>> only retrieve GRIB data and did not want to mess with "Grads" ctl files and
>> CDMS2/CDAT package.    So now, I could just create a script to download the
>> right files and rename them to the appropriate filenames for obs4MIPs
>> examples.
>>
>> I would feel much better about this.   Let me know what you think.
>>
>> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
>> ECMWF+data+servers+in+batch
>>
>> Denis
>>
>> On 3/10/14 1:06 PM, Michael Joyce wrote:
>>
>>> Hi guys,
>>>
>>> An unfortunate side effect of our export from SVN to Git is that we've
>>> ended up with a rather bloated repository. We've had a large number of
>>> binary files in our repo in the past and all of this has been rolled up
>>> into a obnoxious ~500 MB pack file. I've been completely unable to clone
>>> the repo on my home internet because it constantly times out and it's
>>> painfully slow on my faster work connection.
>>>
>>> To fix this problem I suggest we do the following:
>>> - Remove all binary files from our repo and host them externally. For
>>> example, NetCDF files can be downloaded when they're needed and cleaned up
>>> afterwards (for tests or examples).
>>> - Remove all the bloat from our pack file. I was digging through stuff
>>> earlier and found a number of very large and outdated files in our pack
>>> file (~300 MB NC file, internal JPL presentations/files from a long time
>>> ago, etc.). We should be able to use [1] to help automate this for us,
>>> although we can also take care of it on our own if need be.
>>>
>>> Let me know what you guys think the best course of action is. That being
>>> said, dealing with this sooner rather than later would be nice =D
>>>
>>> [1] https://github.com/cmaitchison/git_diet
>>>
>>> -- Joyce
>>>
>>>
>> --
>> -----------------------------------------------------
>> Denis Nadeau, (CSC)
>> NCCS (NASA Center for Climate Simulation)
>> NASA Goddard Space Flight Center
>> Mailcode 606.2
>> 8800 Greenbelt Road
>> Greenbelt, MD 20771
>> Email: denis.nadeau@nasa.gov
>> Phone: (301) 286-7286           Fax: 301.286.1634
>> -----------------------------------------------------
>>
>>


-- 
-----------------------------------------------------
Denis Nadeau, (CSC)
NCCS (NASA Center for Climate Simulation)
NASA Goddard Space Flight Center
Mailcode 606.2
8800 Greenbelt Road
Greenbelt, MD 20771
Email: denis.nadeau@nasa.gov
Phone: (301) 286-7286           Fax: 301.286.1634
-----------------------------------------------------


Re: Our new repository is a bit bloated

Posted by Cameron Goodale <go...@apache.org>.
Mike and Dennis,

I have also experienced the slow checkout and was shocked when I saw over
400MB of data in the repo.  I agree that hosting the *.nc files outside of
the source code repo is the best solution to that issue, and the JPL
internal power points and docs really should be purged too.

If I can help out, let me know.


-Cam



On Mon, Mar 10, 2014 at 12:53 PM, Michael Joyce <jo...@apache.org> wrote:

> I think that would be great Denis! I can go ahead and look at doing
> something similar for the other ocw/ocw-ui components as well. I'm sure
> this will help us out a good bit.
>
> Thanks!
>
>
> -- Joyce
>
>
> On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <denis.nadeau@nasa.gov
> >wrote:
>
> > Michael,
> >
> > I like the idea of having the NetCDF files in a external repository.
> >
> > I was thinking that it might be better to point the people to satellite
> > data at the different DAACs so that they can download the files directly.
> > That would work for the "obs4MIPs" program.     I would feel better about
> > it as well,   I have been worried to be told by some data providers
> (ECMWF)
> > that we are not authorized to distribute their original data.   I
> initially
> > did not think about this when I checked in my original code.
> >
> > I just found out that ECMWF now allows people to download their data in
> > "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but
> could
> > only retrieve GRIB data and did not want to mess with "Grads" ctl files
> and
> > CDMS2/CDAT package.    So now, I could just create a script to download
> the
> > right files and rename them to the appropriate filenames for obs4MIPs
> > examples.
> >
> > I would feel much better about this.   Let me know what you think.
> >
> > [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
> > ECMWF+data+servers+in+batch
> >
> > Denis
> >
> > On 3/10/14 1:06 PM, Michael Joyce wrote:
> >
> >> Hi guys,
> >>
> >> An unfortunate side effect of our export from SVN to Git is that we've
> >> ended up with a rather bloated repository. We've had a large number of
> >> binary files in our repo in the past and all of this has been rolled up
> >> into a obnoxious ~500 MB pack file. I've been completely unable to clone
> >> the repo on my home internet because it constantly times out and it's
> >> painfully slow on my faster work connection.
> >>
> >> To fix this problem I suggest we do the following:
> >> - Remove all binary files from our repo and host them externally. For
> >> example, NetCDF files can be downloaded when they're needed and cleaned
> up
> >> afterwards (for tests or examples).
> >> - Remove all the bloat from our pack file. I was digging through stuff
> >> earlier and found a number of very large and outdated files in our pack
> >> file (~300 MB NC file, internal JPL presentations/files from a long time
> >> ago, etc.). We should be able to use [1] to help automate this for us,
> >> although we can also take care of it on our own if need be.
> >>
> >> Let me know what you guys think the best course of action is. That being
> >> said, dealing with this sooner rather than later would be nice =D
> >>
> >> [1] https://github.com/cmaitchison/git_diet
> >>
> >> -- Joyce
> >>
> >>
> >
> > --
> > -----------------------------------------------------
> > Denis Nadeau, (CSC)
> > NCCS (NASA Center for Climate Simulation)
> > NASA Goddard Space Flight Center
> > Mailcode 606.2
> > 8800 Greenbelt Road
> > Greenbelt, MD 20771
> > Email: denis.nadeau@nasa.gov
> > Phone: (301) 286-7286           Fax: 301.286.1634
> > -----------------------------------------------------
> >
> >
>

Re: Our new repository is a bit bloated

Posted by Michael Joyce <jo...@apache.org>.
I think that would be great Denis! I can go ahead and look at doing
something similar for the other ocw/ocw-ui components as well. I'm sure
this will help us out a good bit.

Thanks!


-- Joyce


On Mon, Mar 10, 2014 at 11:20 AM, denis.nadeau <de...@nasa.gov>wrote:

> Michael,
>
> I like the idea of having the NetCDF files in a external repository.
>
> I was thinking that it might be better to point the people to satellite
> data at the different DAACs so that they can download the files directly.
> That would work for the "obs4MIPs" program.     I would feel better about
> it as well,   I have been worried to be told by some data providers (ECMWF)
> that we are not authorized to distribute their original data.   I initially
> did not think about this when I checked in my original code.
>
> I just found out that ECMWF now allows people to download their data in
> "NetCDF" instead of "GRIB" using Python [1].   I tried it before, but could
> only retrieve GRIB data and did not want to mess with "Grads" ctl files and
> CDMS2/CDAT package.    So now, I could just create a script to download the
> right files and rename them to the appropriate filenames for obs4MIPs
> examples.
>
> I would feel much better about this.   Let me know what you think.
>
> [1] https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+
> ECMWF+data+servers+in+batch
>
> Denis
>
> On 3/10/14 1:06 PM, Michael Joyce wrote:
>
>> Hi guys,
>>
>> An unfortunate side effect of our export from SVN to Git is that we've
>> ended up with a rather bloated repository. We've had a large number of
>> binary files in our repo in the past and all of this has been rolled up
>> into a obnoxious ~500 MB pack file. I've been completely unable to clone
>> the repo on my home internet because it constantly times out and it's
>> painfully slow on my faster work connection.
>>
>> To fix this problem I suggest we do the following:
>> - Remove all binary files from our repo and host them externally. For
>> example, NetCDF files can be downloaded when they're needed and cleaned up
>> afterwards (for tests or examples).
>> - Remove all the bloat from our pack file. I was digging through stuff
>> earlier and found a number of very large and outdated files in our pack
>> file (~300 MB NC file, internal JPL presentations/files from a long time
>> ago, etc.). We should be able to use [1] to help automate this for us,
>> although we can also take care of it on our own if need be.
>>
>> Let me know what you guys think the best course of action is. That being
>> said, dealing with this sooner rather than later would be nice =D
>>
>> [1] https://github.com/cmaitchison/git_diet
>>
>> -- Joyce
>>
>>
>
> --
> -----------------------------------------------------
> Denis Nadeau, (CSC)
> NCCS (NASA Center for Climate Simulation)
> NASA Goddard Space Flight Center
> Mailcode 606.2
> 8800 Greenbelt Road
> Greenbelt, MD 20771
> Email: denis.nadeau@nasa.gov
> Phone: (301) 286-7286           Fax: 301.286.1634
> -----------------------------------------------------
>
>

Re: Our new repository is a bit bloated

Posted by "denis.nadeau" <de...@nasa.gov>.
Michael,

I like the idea of having the NetCDF files in a external repository.

I was thinking that it might be better to point the people to satellite 
data at the different DAACs so that they can download the files 
directly.   That would work for the "obs4MIPs" program.     I would feel 
better about it as well,   I have been worried to be told by some data 
providers (ECMWF) that we are not authorized to distribute their 
original data.   I initially did not think about this when I checked in 
my original code.

I just found out that ECMWF now allows people to download their data in 
"NetCDF" instead of "GRIB" using Python [1].   I tried it before, but 
could only retrieve GRIB data and did not want to mess with "Grads" ctl 
files and CDMS2/CDAT package.    So now, I could just create a script to 
download the right files and rename them to the appropriate filenames 
for obs4MIPs examples.

I would feel much better about this.   Let me know what you think.

[1] 
https://software.ecmwf.int/wiki/display/WEBAPI/Accessing+ECMWF+data+servers+in+batch

Denis
On 3/10/14 1:06 PM, Michael Joyce wrote:
> Hi guys,
>
> An unfortunate side effect of our export from SVN to Git is that we've
> ended up with a rather bloated repository. We've had a large number of
> binary files in our repo in the past and all of this has been rolled up
> into a obnoxious ~500 MB pack file. I've been completely unable to clone
> the repo on my home internet because it constantly times out and it's
> painfully slow on my faster work connection.
>
> To fix this problem I suggest we do the following:
> - Remove all binary files from our repo and host them externally. For
> example, NetCDF files can be downloaded when they're needed and cleaned up
> afterwards (for tests or examples).
> - Remove all the bloat from our pack file. I was digging through stuff
> earlier and found a number of very large and outdated files in our pack
> file (~300 MB NC file, internal JPL presentations/files from a long time
> ago, etc.). We should be able to use [1] to help automate this for us,
> although we can also take care of it on our own if need be.
>
> Let me know what you guys think the best course of action is. That being
> said, dealing with this sooner rather than later would be nice =D
>
> [1] https://github.com/cmaitchison/git_diet
>
> -- Joyce
>


-- 
-----------------------------------------------------
Denis Nadeau, (CSC)
NCCS (NASA Center for Climate Simulation)
NASA Goddard Space Flight Center
Mailcode 606.2
8800 Greenbelt Road
Greenbelt, MD 20771
Email: denis.nadeau@nasa.gov
Phone: (301) 286-7286           Fax: 301.286.1634
-----------------------------------------------------