You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2015/12/04 21:57:39 UTC

Lucene/Solr git mirror will soon turn off

Hello devs,

The infra team has notified us (Lucene/Solr) that in 26 days our
git-svn mirror will be turned off, because running it consumes too
many system resources, affecting other projects, apparently because of
a memory leak in git-svn.

Does anyone know of a link to this git-svn issue?  Is it a known
issue?  If there's something simple we can do (remove old jars from
our svn history, remove old branches), maybe we can sidestep the issue
and infra will allow it to keep running?

Or maybe someone in the Lucene/Solr dev community with prior
experience with git-svn could volunteer to play with it to see if
there's a viable solution, maybe with command-line options e.g. to
only mirror specific branches (trunk, 5.x)?

Or maybe it's time for us to switch to git, but there are problems
there too, e.g. we are currently missing large parts of our svn
history from the mirror now and it's not clear whether that would be
fixed if we switched:
https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
used to add JAR files to svn, the "git clone" would likely take
several GBs unless we remove those JARs from our history.

Or if anyone has any other ideas, we should explore them, because
otherwise in 26 days there will be no more updates to the git mirror
of Lucene and Solr sources...

Thanks,

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Upayavira <uv...@odoko.co.uk>.
In the original report, the Infrastructure team said that throwing
memory at it did not solve the problem. And I believe they threw *a lot*
of memory at it.

There may well be other options - just needs someone to dive in and
look!

Upayavira

On Fri, Dec 4, 2015, at 11:10 PM, Alexandre Rafalovitch wrote:
> Maybe a silly question, but has anybody actually looked into the
> git-svn itself. E.g. talking to git-svn team with our example to help
> them troubleshoot the link. Or run a test sync under profiler.
> Also, it is running into OOM, but how big is a system doing the sync.
> If the issue is upgrading the server from 8gb of memory to 16gb, this
> might be an easier/cheaper course that moving the whole infrastructure
> around. I am sure Lucidworks or Elastic could probably sponsor a
> couple hundred bucks for memory upgrade if that turned out to be the
> real problem. :-)
> 
> Reading JIRA, I get a feeling that this problem with git-svn is mostly
> treated as a blackbox. It feels like there might be other options.
> 
> Regards,
>    Alex.
> 
> On 4 December 2015 at 15:57, Michael McCandless
> <lu...@mikemccandless.com> wrote:
> > The infra team has notified us (Lucene/Solr) that in 26 days our
> > git-svn mirror will be turned off, because running it consumes too
> > many system resources, affecting other projects, apparently because of
> > a memory leak in git-svn.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Maybe a silly question, but has anybody actually looked into the
git-svn itself. E.g. talking to git-svn team with our example to help
them troubleshoot the link. Or run a test sync under profiler.
Also, it is running into OOM, but how big is a system doing the sync.
If the issue is upgrading the server from 8gb of memory to 16gb, this
might be an easier/cheaper course that moving the whole infrastructure
around. I am sure Lucidworks or Elastic could probably sponsor a
couple hundred bucks for memory upgrade if that turned out to be the
real problem. :-)

Reading JIRA, I get a feeling that this problem with git-svn is mostly
treated as a blackbox. It feels like there might be other options.

Regards,
   Alex.

On 4 December 2015 at 15:57, Michael McCandless
<lu...@mikemccandless.com> wrote:
> The infra team has notified us (Lucene/Solr) that in 26 days our
> git-svn mirror will be turned off, because running it consumes too
> many system resources, affecting other projects, apparently because of
> a memory leak in git-svn.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Scott Blum <dr...@gmail.com>.
Ouch... not having an official mirror would be a huge burden on those of us
managing org-specific forks. :(

On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hello devs,
>
> The infra team has notified us (Lucene/Solr) that in 26 days our
> git-svn mirror will be turned off, because running it consumes too
> many system resources, affecting other projects, apparently because of
> a memory leak in git-svn.
>
> Does anyone know of a link to this git-svn issue?  Is it a known
> issue?  If there's something simple we can do (remove old jars from
> our svn history, remove old branches), maybe we can sidestep the issue
> and infra will allow it to keep running?
>
> Or maybe someone in the Lucene/Solr dev community with prior
> experience with git-svn could volunteer to play with it to see if
> there's a viable solution, maybe with command-line options e.g. to
> only mirror specific branches (trunk, 5.x)?
>
> Or maybe it's time for us to switch to git, but there are problems
> there too, e.g. we are currently missing large parts of our svn
> history from the mirror now and it's not clear whether that would be
> fixed if we switched:
> https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
> used to add JAR files to svn, the "git clone" would likely take
> several GBs unless we remove those JARs from our history.
>
> Or if anyone has any other ideas, we should explore them, because
> otherwise in 26 days there will be no more updates to the git mirror
> of Lucene and Solr sources...
>
> Thanks,
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Lucene/Solr git mirror will soon turn off

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
The only downside is GitHub is a convenient way to run blame, etc. It's
very convenient for sleuthing through code. (If only their search wasn't
abysmal in terms of relevancy, but I digress)

Is the more systemic problem large binaries checked in I'm the past? Can we
do any surgery to svn or git to remove these? IIRC this is one reason
avoiding changing from git to svn to begin with. If removing some jars from
an old version of Lucene fixes it, perhaps this is a better long term
solution. I suppose the issue is having someone with the right svn/git
skills and the time to pull this off.

Doug

On Friday, December 4, 2015, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> This looks like a good idea to me. Maybe we just have a limited amount of
> history and branches in Git/Github, so people can work and create pull
> requests. Nobody wants to create pull request on a very old branch or
> against a revision years ago.
>
> Maybe Infra can mirror only the last 2 years of trunk and branch_5x?
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de <javascript:;>
>
> > -----Original Message-----
> > From: Dyer, James [mailto:James.Dyer@ingramcontent.com <javascript:;>]
> > Sent: Friday, December 04, 2015 10:48 PM
> > To: dev@lucene.apache.org <javascript:;>
> > Cc: infrastructure@apache.org <javascript:;>
> > Subject: RE: Lucene/Solr git mirror will soon turn off
> >
> > I know Infra has tried a number of things to resolve this, to no avail.
> But did
> > we try "git-svn --revision=<n>" to only mirror "post-LUCENE-3930" (ivy,
> > r1307099)?  Or if that's not lean enough for the git-svn mirror to work,
> then
> > cut off when 4.x was branched or whenever.  The hope would be to give git
> > users enough of the past that it would be useful for new development but
> > then also we can retain the status quo with svn (which is the best path
> for a
> > 26-day timeframe).
> >
> > James Dyer
> > Ingram Content Group
> >
> >
> > -----Original Message-----
> > From: Michael McCandless [mailto:lucene@mikemccandless.com
> <javascript:;>]
> > Sent: Friday, December 04, 2015 2:58 PM
> > To: Lucene/Solr dev
> > Cc: infrastructure@apache.org <javascript:;>
> > Subject: Lucene/Solr git mirror will soon turn off
> >
> > Hello devs,
> >
> > The infra team has notified us (Lucene/Solr) that in 26 days our
> > git-svn mirror will be turned off, because running it consumes too
> > many system resources, affecting other projects, apparently because of
> > a memory leak in git-svn.
> >
> > Does anyone know of a link to this git-svn issue?  Is it a known
> > issue?  If there's something simple we can do (remove old jars from
> > our svn history, remove old branches), maybe we can sidestep the issue
> > and infra will allow it to keep running?
> >
> > Or maybe someone in the Lucene/Solr dev community with prior
> > experience with git-svn could volunteer to play with it to see if
> > there's a viable solution, maybe with command-line options e.g. to
> > only mirror specific branches (trunk, 5.x)?
> >
> > Or maybe it's time for us to switch to git, but there are problems
> > there too, e.g. we are currently missing large parts of our svn
> > history from the mirror now and it's not clear whether that would be
> > fixed if we switched:
> > https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
> > used to add JAR files to svn, the "git clone" would likely take
> > several GBs unless we remove those JARs from our history.
> >
> > Or if anyone has any other ideas, we should explore them, because
> > otherwise in 26 days there will be no more updates to the git mirror
> > of Lucene and Solr sources...
> >
> > Thanks,
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <javascript:;>
> > For additional commands, e-mail: dev-help@lucene.apache.org
> <javascript:;>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <javascript:;>
> > For additional commands, e-mail: dev-help@lucene.apache.org
> <javascript:;>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <javascript:;>
> For additional commands, e-mail: dev-help@lucene.apache.org <javascript:;>
>
>

-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

RE: Lucene/Solr git mirror will soon turn off

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

This looks like a good idea to me. Maybe we just have a limited amount of history and branches in Git/Github, so people can work and create pull requests. Nobody wants to create pull request on a very old branch or against a revision years ago.

Maybe Infra can mirror only the last 2 years of trunk and branch_5x?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Dyer, James [mailto:James.Dyer@ingramcontent.com]
> Sent: Friday, December 04, 2015 10:48 PM
> To: dev@lucene.apache.org
> Cc: infrastructure@apache.org
> Subject: RE: Lucene/Solr git mirror will soon turn off
> 
> I know Infra has tried a number of things to resolve this, to no avail.  But did
> we try "git-svn --revision=<n>" to only mirror "post-LUCENE-3930" (ivy,
> r1307099)?  Or if that's not lean enough for the git-svn mirror to work, then
> cut off when 4.x was branched or whenever.  The hope would be to give git
> users enough of the past that it would be useful for new development but
> then also we can retain the status quo with svn (which is the best path for a
> 26-day timeframe).
> 
> James Dyer
> Ingram Content Group
> 
> 
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Friday, December 04, 2015 2:58 PM
> To: Lucene/Solr dev
> Cc: infrastructure@apache.org
> Subject: Lucene/Solr git mirror will soon turn off
> 
> Hello devs,
> 
> The infra team has notified us (Lucene/Solr) that in 26 days our
> git-svn mirror will be turned off, because running it consumes too
> many system resources, affecting other projects, apparently because of
> a memory leak in git-svn.
> 
> Does anyone know of a link to this git-svn issue?  Is it a known
> issue?  If there's something simple we can do (remove old jars from
> our svn history, remove old branches), maybe we can sidestep the issue
> and infra will allow it to keep running?
> 
> Or maybe someone in the Lucene/Solr dev community with prior
> experience with git-svn could volunteer to play with it to see if
> there's a viable solution, maybe with command-line options e.g. to
> only mirror specific branches (trunk, 5.x)?
> 
> Or maybe it's time for us to switch to git, but there are problems
> there too, e.g. we are currently missing large parts of our svn
> history from the mirror now and it's not clear whether that would be
> fixed if we switched:
> https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
> used to add JAR files to svn, the "git clone" would likely take
> several GBs unless we remove those JARs from our history.
> 
> Or if anyone has any other ideas, we should explore them, because
> otherwise in 26 days there will be no more updates to the git mirror
> of Lucene and Solr sources...
> 
> Thanks,
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Lucene/Solr git mirror will soon turn off

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
I know Infra has tried a number of things to resolve this, to no avail.  But did we try "git-svn --revision=<n>" to only mirror "post-LUCENE-3930" (ivy, r1307099)?  Or if that's not lean enough for the git-svn mirror to work, then cut off when 4.x was branched or whenever.  The hope would be to give git users enough of the past that it would be useful for new development but then also we can retain the status quo with svn (which is the best path for a 26-day timeframe).

James Dyer
Ingram Content Group


-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: Friday, December 04, 2015 2:58 PM
To: Lucene/Solr dev
Cc: infrastructure@apache.org
Subject: Lucene/Solr git mirror will soon turn off

Hello devs,

The infra team has notified us (Lucene/Solr) that in 26 days our
git-svn mirror will be turned off, because running it consumes too
many system resources, affecting other projects, apparently because of
a memory leak in git-svn.

Does anyone know of a link to this git-svn issue?  Is it a known
issue?  If there's something simple we can do (remove old jars from
our svn history, remove old branches), maybe we can sidestep the issue
and infra will allow it to keep running?

Or maybe someone in the Lucene/Solr dev community with prior
experience with git-svn could volunteer to play with it to see if
there's a viable solution, maybe with command-line options e.g. to
only mirror specific branches (trunk, 5.x)?

Or maybe it's time for us to switch to git, but there are problems
there too, e.g. we are currently missing large parts of our svn
history from the mirror now and it's not clear whether that would be
fixed if we switched:
https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
used to add JAR files to svn, the "git clone" would likely take
several GBs unless we remove those JARs from our history.

Or if anyone has any other ideas, we should explore them, because
otherwise in 26 days there will be no more updates to the git mirror
of Lucene and Solr sources...

Thanks,

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Tue, Dec 8, 2015 at 2:05 PM, Upayavira <uv...@odoko.co.uk> wrote:

> Have we heard anything more from Infrastructure?

Alas, no, unfortunately, at least from what I've seen ...

I would love to know if this memory leak in git-svn is a known issue
so we can be more informed (we've asked several times I think, but no
answer that I've seen, which could just be because it is NOT a known
issue).

Dawid also asked a few days ago for an export from svn so he could
play with git-svn himself (thank you for "volunteering" Dawid!), at
https://issues.apache.org/jira/browse/INFRA-10828 but no response so
far.  This seems the most promising lead to me so far, if only infra
could get the bits to Dawid soon...

Paul Elschot continues to improve his script (thank you!) to work
around delayed git mirroring:
https://issues.apache.org/jira/browse/LUCENE-6922 ... it seems this
may be the only option for git users come Dec 30, unless we decide
soon to do a full cutover from svn to git.

> Once the release is done, I'd be happy to try and get that conversation going faster than it is.

That would be wonderful, thank you!

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Upayavira <uv...@odoko.co.uk>.
Have we heard anything more from Infrastructure? It seems the thing to
do right now is to get more of a conversation going with them to
understand the issue at hand. Once the release is done, I'd be happy to
try and get that conversation going faster than it is.

Upayavira

On Tue, Dec 8, 2015, at 06:57 PM, Dennis Gove wrote:
> github will reject files larger than 100MB and will warn for files
> larger than 50MB
> (https://help.github.com/articles/working-with-large-files/). They
> have recently released Git Large File Storage to alleviate issues
> caused by these restrictions
> (https://github.com/blog/1986-announcing-git-large-file-storage-lfs)
> but there is a cost associated with using such a thing so I would
> imagine that path is a no-go. The limit is on a per-file basis and in
> other projects I've gotten around it by using split to split large
> files before adding to a github repo and then using cat to combine the
> pieces back before using the file. I'm not sure how feasible of a
> solution that would be for us but perhaps we could add hooks to do the
> split-ting and cat-ing automatically for users.
>
> I'm in favor of a full switch to git (and github).
>
> Doing would require changes to the ant build scripts as at least one
> command (package and related package commands) requires an svn
> checkout to add some information to the created package. We'd have to
> change that logic to instead look at git metadata.
>
> On Mon, Dec 7, 2015 at 2:48 AM, Dawid Weiss
> <da...@gmail.com> wrote:
>> I tried it once (for storing large text files -- Polish dictionaries,
>>
uncompressed -- on github), but it simply didn't work. More headaches
>>
than benefits (to me).
>>
>>
Dawid
>>
>>
On Sun, Dec 6, 2015 at 10:04 PM, Doug Turnbull
>>
<dt...@opensourceconnections.com> wrote:
>>
> I had not heard of git-lfs looks promising
>>
>
>>
> https://git-lfs.github.com/?utm_source=github_site&utm_medium=blog&utm_campaign=gitlfs
>>
>
>>
>
>>
> On Sunday, December 6, 2015, Jan Høydahl
> <ja...@cominvent.com> wrote:
>>
>>
>>
>> If the size of historic jars is the problem here, would looking into
>>
>> git-lfs for *.jar be one workaround? I might also be totally off
>> here :-)
>>
>>
>>
>> --
>>
>> Jan Høydahl, search solution architect
>>
>> Cominvent AS - www.cominvent.com
>>
>>
>>
>> 6. des. 2015 kl. 00.46 skrev Scott Blum <dr...@gmail.com>:
>>
>>
>>
>> If lucene was a new project being started today, is there any
>> question
>>
>> about whether it would be managed in svn or git?  If not, this
>> might be a
>>
>> good impetus for moving to a better world.
>>
>>
>>
>> On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley
>> <ys...@gmail.com> wrote:
>>
>>>
>>
>>> On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com
>>
>>> <da...@gmail.com> wrote:
>>
>>> > I understand Gus; but we’d like to separate the question of
>>> > wether we
>>
>>> > should
>>
>>> > move from svn to git from fixing the git mirror.
>>
>>>
>>
>>> Except moving to git is one path to fixing the issue, so it's not
>>
>>> really separate.
>>
>>> Give the multiple problems that the svn-git bridge seems to
>>> have (both
>>
>>> memory leaks + history), perhaps the sooner we switch to git, the
>>
>>> better.
>>
>>>
>>
>>> -Yonik
>>
>>>
>>
>>> ---------------------------------------------------------------
>>> ------
>>
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>>
>>
>>
>>
>>
>>
>
>>
>
>>
> --
>>
> Doug Turnbull | Search Relevance Consultant | OpenSource
> Connections, LLC |
>>
> 240.476.9983
>>
> Author: Relevant Search
>>
> This e-mail and all contents, including attachments, is
> considered to be
>>
> Company Confidential unless explicitly stated otherwise, regardless of
>>
> whether attachments are marked as such.
>>
>
>>
>>
---------------------------------------------------------------------
>>
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>
For additional commands, e-mail: dev-help@lucene.apache.org
>>

Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
Grant's 1.3gb record commit was adding HTML files with JavaDocs to the
cms.... probably not that relevant either.

svn log -v -r1240618 https://svn.apache.org/repos/asf/lucene

It's fun exploring, actually... I bet with a few proper exclusions one
can get down to manageable size. As always with conversions between
version management systems, the question remains how to map tags/
branches to their corresponding git concepts, etc.

D.

On Tue, Dec 8, 2015 at 10:16 PM, Dawid Weiss <da...@gmail.com> wrote:
> One more thing, perhaps of importance, the raw Lucene repo contains
> all the history of projects that then turned top-level (Nutch,
> Mahout). These could also be dropped (or ignored) when converting to
> git. If we agree JARs are not relevant, why should projects not
> directly related to Lucene/ Solr be?
>
> Dawid
>
> On Tue, Dec 8, 2015 at 10:05 PM, Dawid Weiss <da...@gmail.com> wrote:
>>> Don’t know how much we have of historic jars in our history.
>>
>> I actually do know. Or will know. In about ~10 hours. I wrote a script
>> that does the following:
>>
>> 1) git log all revisions touching https://svn.apache.org/repos/asf/lucene
>> 2) grep revision numbers
>> 3) use svnrdump to get every single commit (revision) above, in
>> incremental mode.
>>
>> This will allow me to:
>>
>> 1) recreate only Lucene/ Solr SVN, locally.
>> 2) measure the size of SVN repo.
>> 3) measure the size of any conversion to git (even if it's one-by-one
>> checkout, then-sync with git).
>>
>> From what I see up until now size should not be an issue at all. Even
>> with all binary blobs so far the SVN incremental dumps measure ~3.7G
>> (and I'm about 75% done). There is one interesting super-large commit,
>> this one:
>>
>> svn log -r1240618 https://svn.apache.org/repos/asf/lucene
>> ------------------------------------------------------------------------
>> r1240618 | gsingers | 2012-02-04 22:45:17 +0100 (Sat, 04 Feb 2012) | 1 line
>>
>> LUCENE-2748: bring in old Lucene docs
>>
>> This commit diff weights... wait for it... 1.3G! I didn't check what
>> it actually was.
>>
>> Will keep you posted.
>>
>> D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Upayavira <uv...@odoko.co.uk>.
You can't avoid having the history in SVN. The ASF has one large repo,
and won't be deleting that repo, so the history will survive in
perpetuity, regardless of what we do now.

Upayavira

On Tue, Dec 8, 2015, at 09:24 PM, Doug Turnbull wrote:
> It seems you'd want to preserve that history in a frozen/archiced
> Apache Svn repo for Lucene. Then make the new git repo slimmer before
> switching. Folks that want very old versions or doing research can at
> least go through the original SVN repo.
>
> On Tuesday, December 8, 2015, Dawid Weiss
> <da...@gmail.com> wrote:
>> One more thing, perhaps of importance, the raw Lucene repo contains
>>
all the history of projects that then turned top-level (Nutch,
>>
Mahout). These could also be dropped (or ignored) when converting to
>>
git. If we agree JARs are not relevant, why should projects not
>>
directly related to Lucene/ Solr be?
>>
>>
Dawid
>>
>>
On Tue, Dec 8, 2015 at 10:05 PM, Dawid Weiss <da...@gmail.com> wrote:
>>
>> Don’t know how much we have of historic jars in our history.
>>
>
>>
> I actually do know. Or will know. In about ~10 hours. I wrote a script
>>
> that does the following:
>>
>
>>
> 1) git log all revisions touching
>    https://svn.apache.org/repos/asf/lucene
>>
> 2) grep revision numbers
>>
> 3) use svnrdump to get every single commit (revision) above, in
>>
> incremental mode.
>>
>
>>
> This will allow me to:
>>
>
>>
> 1) recreate only Lucene/ Solr SVN, locally.
>>
> 2) measure the size of SVN repo.
>>
> 3) measure the size of any conversion to git (even if it's one-by-one
>>
> checkout, then-sync with git).
>>
>
>>
> From what I see up until now size should not be an issue at all. Even
>>
> with all binary blobs so far the SVN incremental dumps measure ~3.7G
>>
> (and I'm about 75% done). There is one interesting super-large commit,
>>
> this one:
>>
>
>>
> svn log -r1240618 https://svn.apache.org/repos/asf/lucene
>>
> ----------------------------------------------------------------
> --------
>>
> r1240618 | gsingers | 2012-02-04 22:45:17 +0100 (Sat, 04 Feb 2012)
> | 1 line
>>
>
>>
> LUCENE-2748: bring in old Lucene docs
>>
>
>>
> This commit diff weights... wait for it... 1.3G! I didn't check what
>>
> it actually was.
>>
>
>>
> Will keep you posted.
>>
>
>>
> D.
>>
>>
---------------------------------------------------------------------
>>
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>
For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> Connections[1], LLC | 240.476.9983 Author:Relevant Search[2] This e-
> mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.



Links:

  1. http://opensourceconnections.com
  2. http://manning.com/turnbull

Re: Lucene/Solr git mirror will soon turn off

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
It seems you'd want to preserve that history in a frozen/archiced Apache
Svn repo for Lucene. Then make the new git repo slimmer before switching.
Folks that want very old versions or doing research can at least go through
the original SVN repo.

On Tuesday, December 8, 2015, Dawid Weiss <da...@gmail.com> wrote:

> One more thing, perhaps of importance, the raw Lucene repo contains
> all the history of projects that then turned top-level (Nutch,
> Mahout). These could also be dropped (or ignored) when converting to
> git. If we agree JARs are not relevant, why should projects not
> directly related to Lucene/ Solr be?
>
> Dawid
>
> On Tue, Dec 8, 2015 at 10:05 PM, Dawid Weiss <dawid.weiss@gmail.com
> <javascript:;>> wrote:
> >> Don’t know how much we have of historic jars in our history.
> >
> > I actually do know. Or will know. In about ~10 hours. I wrote a script
> > that does the following:
> >
> > 1) git log all revisions touching
> https://svn.apache.org/repos/asf/lucene
> > 2) grep revision numbers
> > 3) use svnrdump to get every single commit (revision) above, in
> > incremental mode.
> >
> > This will allow me to:
> >
> > 1) recreate only Lucene/ Solr SVN, locally.
> > 2) measure the size of SVN repo.
> > 3) measure the size of any conversion to git (even if it's one-by-one
> > checkout, then-sync with git).
> >
> > From what I see up until now size should not be an issue at all. Even
> > with all binary blobs so far the SVN incremental dumps measure ~3.7G
> > (and I'm about 75% done). There is one interesting super-large commit,
> > this one:
> >
> > svn log -r1240618 https://svn.apache.org/repos/asf/lucene
> > ------------------------------------------------------------------------
> > r1240618 | gsingers | 2012-02-04 22:45:17 +0100 (Sat, 04 Feb 2012) | 1
> line
> >
> > LUCENE-2748: bring in old Lucene docs
> >
> > This commit diff weights... wait for it... 1.3G! I didn't check what
> > it actually was.
> >
> > Will keep you posted.
> >
> > D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <javascript:;>
> For additional commands, e-mail: dev-help@lucene.apache.org <javascript:;>
>
>

-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
> So you're trying to minimise the size of a git clone?

Yes and no. I'm just a curious individual. My gut feeling is that even
with 10+ years of history and binary blobs inside, the size of the
repo (git or SVN) should *not* be much of a problem. It's merely ~47k
worth of revisions... :)

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Upayavira <uv...@odoko.co.uk>.
So you're trying to minimise the size of a git clone?

I'd agree that Nutch etc aren't relevant.

Upayavira

On Tue, Dec 8, 2015, at 09:16 PM, Dawid Weiss wrote:
> One more thing, perhaps of importance, the raw Lucene repo contains
> all the history of projects that then turned top-level (Nutch,
> Mahout). These could also be dropped (or ignored) when converting to
> git. If we agree JARs are not relevant, why should projects not
> directly related to Lucene/ Solr be?
> 
> Dawid
> 
> On Tue, Dec 8, 2015 at 10:05 PM, Dawid Weiss <da...@gmail.com>
> wrote:
> >> Don’t know how much we have of historic jars in our history.
> >
> > I actually do know. Or will know. In about ~10 hours. I wrote a script
> > that does the following:
> >
> > 1) git log all revisions touching https://svn.apache.org/repos/asf/lucene
> > 2) grep revision numbers
> > 3) use svnrdump to get every single commit (revision) above, in
> > incremental mode.
> >
> > This will allow me to:
> >
> > 1) recreate only Lucene/ Solr SVN, locally.
> > 2) measure the size of SVN repo.
> > 3) measure the size of any conversion to git (even if it's one-by-one
> > checkout, then-sync with git).
> >
> > From what I see up until now size should not be an issue at all. Even
> > with all binary blobs so far the SVN incremental dumps measure ~3.7G
> > (and I'm about 75% done). There is one interesting super-large commit,
> > this one:
> >
> > svn log -r1240618 https://svn.apache.org/repos/asf/lucene
> > ------------------------------------------------------------------------
> > r1240618 | gsingers | 2012-02-04 22:45:17 +0100 (Sat, 04 Feb 2012) | 1 line
> >
> > LUCENE-2748: bring in old Lucene docs
> >
> > This commit diff weights... wait for it... 1.3G! I didn't check what
> > it actually was.
> >
> > Will keep you posted.
> >
> > D.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
One more thing, perhaps of importance, the raw Lucene repo contains
all the history of projects that then turned top-level (Nutch,
Mahout). These could also be dropped (or ignored) when converting to
git. If we agree JARs are not relevant, why should projects not
directly related to Lucene/ Solr be?

Dawid

On Tue, Dec 8, 2015 at 10:05 PM, Dawid Weiss <da...@gmail.com> wrote:
>> Don’t know how much we have of historic jars in our history.
>
> I actually do know. Or will know. In about ~10 hours. I wrote a script
> that does the following:
>
> 1) git log all revisions touching https://svn.apache.org/repos/asf/lucene
> 2) grep revision numbers
> 3) use svnrdump to get every single commit (revision) above, in
> incremental mode.
>
> This will allow me to:
>
> 1) recreate only Lucene/ Solr SVN, locally.
> 2) measure the size of SVN repo.
> 3) measure the size of any conversion to git (even if it's one-by-one
> checkout, then-sync with git).
>
> From what I see up until now size should not be an issue at all. Even
> with all binary blobs so far the SVN incremental dumps measure ~3.7G
> (and I'm about 75% done). There is one interesting super-large commit,
> this one:
>
> svn log -r1240618 https://svn.apache.org/repos/asf/lucene
> ------------------------------------------------------------------------
> r1240618 | gsingers | 2012-02-04 22:45:17 +0100 (Sat, 04 Feb 2012) | 1 line
>
> LUCENE-2748: bring in old Lucene docs
>
> This commit diff weights... wait for it... 1.3G! I didn't check what
> it actually was.
>
> Will keep you posted.
>
> D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
> Don’t know how much we have of historic jars in our history.

I actually do know. Or will know. In about ~10 hours. I wrote a script
that does the following:

1) git log all revisions touching https://svn.apache.org/repos/asf/lucene
2) grep revision numbers
3) use svnrdump to get every single commit (revision) above, in
incremental mode.

This will allow me to:

1) recreate only Lucene/ Solr SVN, locally.
2) measure the size of SVN repo.
3) measure the size of any conversion to git (even if it's one-by-one
checkout, then-sync with git).

>From what I see up until now size should not be an issue at all. Even
with all binary blobs so far the SVN incremental dumps measure ~3.7G
(and I'm about 75% done). There is one interesting super-large commit,
this one:

svn log -r1240618 https://svn.apache.org/repos/asf/lucene
------------------------------------------------------------------------
r1240618 | gsingers | 2012-02-04 22:45:17 +0100 (Sat, 04 Feb 2012) | 1 line

LUCENE-2748: bring in old Lucene docs

This commit diff weights... wait for it... 1.3G! I didn't check what
it actually was.

Will keep you posted.

D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Jan Høydahl <ja...@cominvent.com>.
The lfs cost at GitHub starts at >1Gb. Don’t know how much we have of historic jars in our history. Also, as far as I understand, Apache is free to install their own git-lfs server, so the repository will use an Apache-operated server for storing the large files instead of GitHub’s own storage service. Since we don’t check in jars anymore, this would be a one-time sync to populate lfs, and then git clients will get small pointer files in pace of the large files, and will need to install git-lfs and run "git lfs fetch” in order to replace these with proper binaries.

However, until we know more about why the svn-git mirroring breaks, we cannot say whether LFS would help at all.

If we migrate to git (which I’m totally in favor of), I guess LFS could be a way to migrate ALL history at a lower cost, so new users can clone the whole repo faster. It will be the “git lfs fetch” stage that takes time if the user chooses to fetch various large files. For normal usage of current branches it will not be necessary. Win win.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 8. des. 2015 kl. 19.57 skrev Dennis Gove <dp...@gmail.com>:
> 
> github will reject files larger than 100MB and will warn for files larger than 50MB (https://help.github.com/articles/working-with-large-files/ <https://help.github.com/articles/working-with-large-files/>). They have recently released Git Large File Storage to alleviate issues caused by these restrictions (https://github.com/blog/1986-announcing-git-large-file-storage-lfs <https://github.com/blog/1986-announcing-git-large-file-storage-lfs>) but there is a cost associated with using such a thing so I would imagine that path is a no-go. The limit is on a per-file basis and in other projects I've gotten around it by using split to split large files before adding to a github repo and then using cat to combine the pieces back before using the file. I'm not sure how feasible of a solution that would be for us but perhaps we could add hooks to do the split-ting and cat-ing automatically for users. 
> 
> I'm in favor of a full switch to git (and github).
> 
> Doing would require changes to the ant build scripts as at least one command (package and related package commands) requires an svn checkout to add some information to the created package. We'd have to change that logic to instead look at git metadata.
> 
> On Mon, Dec 7, 2015 at 2:48 AM, Dawid Weiss <dawid.weiss@gmail.com <ma...@gmail.com>> wrote:
> I tried it once (for storing large text files -- Polish dictionaries,
> uncompressed -- on github), but it simply didn't work. More headaches
> than benefits (to me).
> 
> Dawid
> 
> On Sun, Dec 6, 2015 at 10:04 PM, Doug Turnbull
> <dturnbull@opensourceconnections.com <ma...@opensourceconnections.com>> wrote:
> > I had not heard of git-lfs looks promising
> >
> > https://git-lfs.github.com/?utm_source=github_site&utm_medium=blog&utm_campaign=gitlfs <https://git-lfs.github.com/?utm_source=github_site&utm_medium=blog&utm_campaign=gitlfs>
> >
> >
> > On Sunday, December 6, 2015, Jan Høydahl <jan.asf@cominvent.com <ma...@cominvent.com>> wrote:
> >>
> >> If the size of historic jars is the problem here, would looking into
> >> git-lfs for *.jar be one workaround? I might also be totally off here :-)
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
> >>
> >> 6. des. 2015 kl. 00.46 skrev Scott Blum <dragonsinth@gmail.com <ma...@gmail.com>>:
> >>
> >> If lucene was a new project being started today, is there any question
> >> about whether it would be managed in svn or git?  If not, this might be a
> >> good impetus for moving to a better world.
> >>
> >> On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley <yseeley@gmail.com <ma...@gmail.com>> wrote:
> >>>
> >>> On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com <ma...@gmail.com>
> >>> <david.w.smiley@gmail.com <ma...@gmail.com>> wrote:
> >>> > I understand Gus; but we’d like to separate the question of wether we
> >>> > should
> >>> > move from svn to git from fixing the git mirror.
> >>>
> >>> Except moving to git is one path to fixing the issue, so it's not
> >>> really separate.
> >>> Give the multiple problems that the svn-git bridge seems to have (both
> >>> memory leaks + history), perhaps the sooner we switch to git, the
> >>> better.
> >>>
> >>> -Yonik
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <ma...@lucene.apache.org>
> >>> For additional commands, e-mail: dev-help@lucene.apache.org <ma...@lucene.apache.org>
> >>>
> >>
> >>
> >
> >
> > --
> > Doug Turnbull | Search Relevance Consultant | OpenSource Connections, LLC |
> > 240.476.9983 <tel:240.476.9983>
> > Author: Relevant Search
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <ma...@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org <ma...@lucene.apache.org>
> 
> 


Re: Lucene/Solr git mirror will soon turn off

Posted by Dennis Gove <dp...@gmail.com>.
github will reject files larger than 100MB and will warn for files larger
than 50MB (https://help.github.com/articles/working-with-large-files/).
They have recently released Git Large File Storage to alleviate issues
caused by these restrictions (
https://github.com/blog/1986-announcing-git-large-file-storage-lfs) but
there is a cost associated with using such a thing so I would imagine that
path is a no-go. The limit is on a per-file basis and in other projects
I've gotten around it by using split to split large files before adding to
a github repo and then using cat to combine the pieces back before using
the file. I'm not sure how feasible of a solution that would be for us but
perhaps we could add hooks to do the split-ting and cat-ing automatically
for users.

I'm in favor of a full switch to git (and github).

Doing would require changes to the ant build scripts as at least one
command (package and related package commands) requires an svn checkout to
add some information to the created package. We'd have to change that logic
to instead look at git metadata.

On Mon, Dec 7, 2015 at 2:48 AM, Dawid Weiss <da...@gmail.com> wrote:

> I tried it once (for storing large text files -- Polish dictionaries,
> uncompressed -- on github), but it simply didn't work. More headaches
> than benefits (to me).
>
> Dawid
>
> On Sun, Dec 6, 2015 at 10:04 PM, Doug Turnbull
> <dt...@opensourceconnections.com> wrote:
> > I had not heard of git-lfs looks promising
> >
> >
> https://git-lfs.github.com/?utm_source=github_site&utm_medium=blog&utm_campaign=gitlfs
> >
> >
> > On Sunday, December 6, 2015, Jan Høydahl <ja...@cominvent.com> wrote:
> >>
> >> If the size of historic jars is the problem here, would looking into
> >> git-lfs for *.jar be one workaround? I might also be totally off here
> :-)
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >> 6. des. 2015 kl. 00.46 skrev Scott Blum <dr...@gmail.com>:
> >>
> >> If lucene was a new project being started today, is there any question
> >> about whether it would be managed in svn or git?  If not, this might be
> a
> >> good impetus for moving to a better world.
> >>
> >> On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley <ys...@gmail.com> wrote:
> >>>
> >>> On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com
> >>> <da...@gmail.com> wrote:
> >>> > I understand Gus; but we’d like to separate the question of wether we
> >>> > should
> >>> > move from svn to git from fixing the git mirror.
> >>>
> >>> Except moving to git is one path to fixing the issue, so it's not
> >>> really separate.
> >>> Give the multiple problems that the svn-git bridge seems to have (both
> >>> memory leaks + history), perhaps the sooner we switch to git, the
> >>> better.
> >>>
> >>> -Yonik
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>
> >>
> >>
> >
> >
> > --
> > Doug Turnbull | Search Relevance Consultant | OpenSource Connections,
> LLC |
> > 240.476.9983
> > Author: Relevant Search
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
I tried it once (for storing large text files -- Polish dictionaries,
uncompressed -- on github), but it simply didn't work. More headaches
than benefits (to me).

Dawid

On Sun, Dec 6, 2015 at 10:04 PM, Doug Turnbull
<dt...@opensourceconnections.com> wrote:
> I had not heard of git-lfs looks promising
>
> https://git-lfs.github.com/?utm_source=github_site&utm_medium=blog&utm_campaign=gitlfs
>
>
> On Sunday, December 6, 2015, Jan Høydahl <ja...@cominvent.com> wrote:
>>
>> If the size of historic jars is the problem here, would looking into
>> git-lfs for *.jar be one workaround? I might also be totally off here :-)
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 6. des. 2015 kl. 00.46 skrev Scott Blum <dr...@gmail.com>:
>>
>> If lucene was a new project being started today, is there any question
>> about whether it would be managed in svn or git?  If not, this might be a
>> good impetus for moving to a better world.
>>
>> On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley <ys...@gmail.com> wrote:
>>>
>>> On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com
>>> <da...@gmail.com> wrote:
>>> > I understand Gus; but we’d like to separate the question of wether we
>>> > should
>>> > move from svn to git from fixing the git mirror.
>>>
>>> Except moving to git is one path to fixing the issue, so it's not
>>> really separate.
>>> Give the multiple problems that the svn-git bridge seems to have (both
>>> memory leaks + history), perhaps the sooner we switch to git, the
>>> better.
>>>
>>> -Yonik
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>>
>
>
> --
> Doug Turnbull | Search Relevance Consultant | OpenSource Connections, LLC |
> 240.476.9983
> Author: Relevant Search
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
I had not heard of git-lfs looks promising

https://git-lfs.github.com/?utm_source=github_site&utm_medium=blog&utm_campaign=gitlfs

On Sunday, December 6, 2015, Jan Høydahl <ja...@cominvent.com> wrote:

> If the size of historic jars is the problem here, would looking into
> git-lfs for *.jar be one workaround? I might also be totally off here :-)
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 6. des. 2015 kl. 00.46 skrev Scott Blum <dragonsinth@gmail.com
> <javascript:_e(%7B%7D,'cvml','dragonsinth@gmail.com');>>:
>
> If lucene was a new project being started today, is there any question
> about whether it would be managed in svn or git?  If not, this might be a
> good impetus for moving to a better world.
>
> On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley <yseeley@gmail.com
> <javascript:_e(%7B%7D,'cvml','yseeley@gmail.com');>> wrote:
>
>> On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com
>> <javascript:_e(%7B%7D,'cvml','david.w.smiley@gmail.com');>
>> <david.w.smiley@gmail.com
>> <javascript:_e(%7B%7D,'cvml','david.w.smiley@gmail.com');>> wrote:
>> > I understand Gus; but we’d like to separate the question of wether we
>> should
>> > move from svn to git from fixing the git mirror.
>>
>> Except moving to git is one path to fixing the issue, so it's not
>> really separate.
>> Give the multiple problems that the svn-git bridge seems to have (both
>> memory leaks + history), perhaps the sooner we switch to git, the
>> better.
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> <javascript:_e(%7B%7D,'cvml','dev-unsubscribe@lucene.apache.org');>
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> <javascript:_e(%7B%7D,'cvml','dev-help@lucene.apache.org');>
>>
>>
>
>

-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: Lucene/Solr git mirror will soon turn off

Posted by Jan Høydahl <ja...@cominvent.com>.
If the size of historic jars is the problem here, would looking into git-lfs for *.jar be one workaround? I might also be totally off here :-)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 6. des. 2015 kl. 00.46 skrev Scott Blum <dr...@gmail.com>:
> 
> If lucene was a new project being started today, is there any question about whether it would be managed in svn or git?  If not, this might be a good impetus for moving to a better world.
> 
> On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley <yseeley@gmail.com <ma...@gmail.com>> wrote:
> On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com <ma...@gmail.com>
> <david.w.smiley@gmail.com <ma...@gmail.com>> wrote:
> > I understand Gus; but we’d like to separate the question of wether we should
> > move from svn to git from fixing the git mirror.
> 
> Except moving to git is one path to fixing the issue, so it's not
> really separate.
> Give the multiple problems that the svn-git bridge seems to have (both
> memory leaks + history), perhaps the sooner we switch to git, the
> better.
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <ma...@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org <ma...@lucene.apache.org>
> 
> 


Re: Lucene/Solr git mirror will soon turn off

Posted by Scott Blum <dr...@gmail.com>.
If lucene was a new project being started today, is there any question
about whether it would be managed in svn or git?  If not, this might be a
good impetus for moving to a better world.

On Sat, Dec 5, 2015 at 6:19 PM, Yonik Seeley <ys...@gmail.com> wrote:

> On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com
> <da...@gmail.com> wrote:
> > I understand Gus; but we’d like to separate the question of wether we
> should
> > move from svn to git from fixing the git mirror.
>
> Except moving to git is one path to fixing the issue, so it's not
> really separate.
> Give the multiple problems that the svn-git bridge seems to have (both
> memory leaks + history), perhaps the sooner we switch to git, the
> better.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Lucene/Solr git mirror will soon turn off

Posted by Yonik Seeley <ys...@gmail.com>.
On Sat, Dec 5, 2015 at 5:53 PM, david.w.smiley@gmail.com
<da...@gmail.com> wrote:
> I understand Gus; but we’d like to separate the question of wether we should
> move from svn to git from fixing the git mirror.

Except moving to git is one path to fixing the issue, so it's not
really separate.
Give the multiple problems that the svn-git bridge seems to have (both
memory leaks + history), perhaps the sooner we switch to git, the
better.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.
I understand Gus; but we’d like to separate the question of wether we
should move from svn to git from fixing the git mirror.  It’s contentious —
I encourage you to search the list archives for some of the arguments.

On Sat, Dec 5, 2015 at 12:53 PM Gus Heck <gu...@gmail.com> wrote:

> If I understand this thread (perhaps not?) The issue comes from synching
> git and svn? If we move to git only, all old versions and jars will live in
> svn so anyone who needs to build an old version is all set. The move to git
> can retain history without jars for "blame".  .
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Lucene/Solr git mirror will soon turn off

Posted by Gus Heck <gu...@gmail.com>.
If I understand this thread (perhaps not?) The issue comes from synching
git and svn? If we move to git only, all old versions and jars will live in
svn so anyone who needs to build an old version is all set. The move to git
can retain history without jars for "blame".  .

Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
I'm fine if we drop the jars, really. I'm just fond of having a "real"
history of a project, that's all. And I don't think the conversion
problem stems from JARs alone; I think there's some other underlying
issue. I asked for a filtered dump of the svn repo branch, perhaps I
can experiment a bit and see what's going on.

Dawid

On Sat, Dec 5, 2015 at 6:41 PM, Erick Erickson <er...@gmail.com> wrote:
> re: keeping old jars around...
>
> Having all the old jars around is a nice idea, but do we know that
> anybody really cares?
>
> Straw-man two question poll:
>
> 1> What's the most recent version of Solr/Lucene you'd be OK with
> nuking the jars?
> 2> In the last year, what's the oldest version of Solr/Lucene you've
> built that had been released for more than 6 months? ("I never do
> this" is a fine answer)
>
> Wondering how much of this is a "Trip to Abilene". Long form:
> https://en.wikipedia.org/wiki/Abilene_paradox
>
> Short form:
> "a group of people collectively decide on a course of action that is
> counter to the preferences of many (or all) of the individuals in the
> group."
>
> On Fri, Dec 4, 2015 at 10:01 PM, david.w.smiley@gmail.com
> <da...@gmail.com> wrote:
>> I agree with Rob on this — delete the ‘jar’s from git history, for all the
>> reasons Rob said.  If someone wants to attempt to actually *build* an old
>> release, and thus needs the jars, then they are welcome to use ASF SVN
>> archives for that purpose instead, and even then apparently it will be a
>> challenge based on what I’ve read today.
>>
>> Any way, maybe this will or maybe this won’t even solve the git-svn OOM
>> problem by itself?  It’s worth a shot to find out as a trial run; no?  Maybe
>> we could ask infra to try as an experiment.  If it doesn’t solve the problem
>> then we needn’t belabor this decision at this time — it can be resumed at a
>> future git transitional discussion, which is not the subject matter of the
>> current crisis.
>>
>> bq. I know you won't accept rational arguments. :)
>>
>> Dawid, please, lets not provoke each other with that kind of talk.  The
>> smiley face doesn’t make it okay.
>>
>> ~ David
>>
>> On Fri, Dec 4, 2015 at 4:26 PM Dawid Weiss <da...@gmail.com> wrote:
>>>
>>> > I don't think jar files are 'history' and it was a mistake we had so
>>> > many in source control before we cleaned that up. it is much better
>>> > without them.
>>>
>>> Depends how you look at it. If your goal is to be able to actually
>>> build ancient versions then dropping those JARs is going to be a real
>>> pain. I think they should stay. Like I said, git is smart enough to
>>> omit objects that aren't referenced from the cloned branch. The
>>> conversion from SVN would have to be smart, but it's all doable.
>>>
>>> > this bloats the repository, makes clone slow for someone new who just
>>> > wants to check it out to work on it, etc.
>>>
>>> No, not really. There is a dozen ways to do it without cloning the
>>> full repo (provide a patch with --depth 1, clone a selective branch,
>>> etc.). We've had that discussion before. I know you won't accept
>>> rational arguments. :)
>>>
>>> D.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Erick Erickson <er...@gmail.com>.
re: keeping old jars around...

Having all the old jars around is a nice idea, but do we know that
anybody really cares?

Straw-man two question poll:

1> What's the most recent version of Solr/Lucene you'd be OK with
nuking the jars?
2> In the last year, what's the oldest version of Solr/Lucene you've
built that had been released for more than 6 months? ("I never do
this" is a fine answer)

Wondering how much of this is a "Trip to Abilene". Long form:
https://en.wikipedia.org/wiki/Abilene_paradox

Short form:
"a group of people collectively decide on a course of action that is
counter to the preferences of many (or all) of the individuals in the
group."

On Fri, Dec 4, 2015 at 10:01 PM, david.w.smiley@gmail.com
<da...@gmail.com> wrote:
> I agree with Rob on this — delete the ‘jar’s from git history, for all the
> reasons Rob said.  If someone wants to attempt to actually *build* an old
> release, and thus needs the jars, then they are welcome to use ASF SVN
> archives for that purpose instead, and even then apparently it will be a
> challenge based on what I’ve read today.
>
> Any way, maybe this will or maybe this won’t even solve the git-svn OOM
> problem by itself?  It’s worth a shot to find out as a trial run; no?  Maybe
> we could ask infra to try as an experiment.  If it doesn’t solve the problem
> then we needn’t belabor this decision at this time — it can be resumed at a
> future git transitional discussion, which is not the subject matter of the
> current crisis.
>
> bq. I know you won't accept rational arguments. :)
>
> Dawid, please, lets not provoke each other with that kind of talk.  The
> smiley face doesn’t make it okay.
>
> ~ David
>
> On Fri, Dec 4, 2015 at 4:26 PM Dawid Weiss <da...@gmail.com> wrote:
>>
>> > I don't think jar files are 'history' and it was a mistake we had so
>> > many in source control before we cleaned that up. it is much better
>> > without them.
>>
>> Depends how you look at it. If your goal is to be able to actually
>> build ancient versions then dropping those JARs is going to be a real
>> pain. I think they should stay. Like I said, git is smart enough to
>> omit objects that aren't referenced from the cloned branch. The
>> conversion from SVN would have to be smart, but it's all doable.
>>
>> > this bloats the repository, makes clone slow for someone new who just
>> > wants to check it out to work on it, etc.
>>
>> No, not really. There is a dozen ways to do it without cloning the
>> full repo (provide a patch with --depth 1, clone a selective branch,
>> etc.). We've had that discussion before. I know you won't accept
>> rational arguments. :)
>>
>> D.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by "david.w.smiley@gmail.com" <da...@gmail.com>.
I agree with Rob on this — delete the ‘jar’s from git history, for all the
reasons Rob said.  If someone wants to attempt to actually *build* an old
release, and thus needs the jars, then they are welcome to use ASF SVN
archives for that purpose instead, and even then apparently it will be a
challenge based on what I’ve read today.

Any way, maybe this will or maybe this won’t even solve the git-svn OOM
problem by itself?  It’s worth a shot to find out as a trial run; no?
Maybe we could ask infra to try as an experiment.  If it doesn’t solve the
problem then we needn’t belabor this decision at this time — it can be
resumed at a future git transitional discussion, which is not the subject
matter of the current crisis.

bq. I know you won't accept rational arguments. :)

Dawid, please, lets not provoke each other with that kind of talk.  The
smiley face doesn’t make it okay.

~ David

On Fri, Dec 4, 2015 at 4:26 PM Dawid Weiss <da...@gmail.com> wrote:

> > I don't think jar files are 'history' and it was a mistake we had so
> > many in source control before we cleaned that up. it is much better
> > without them.
>
> Depends how you look at it. If your goal is to be able to actually
> build ancient versions then dropping those JARs is going to be a real
> pain. I think they should stay. Like I said, git is smart enough to
> omit objects that aren't referenced from the cloned branch. The
> conversion from SVN would have to be smart, but it's all doable.
>
> > this bloats the repository, makes clone slow for someone new who just
> > wants to check it out to work on it, etc.
>
> No, not really. There is a dozen ways to do it without cloning the
> full repo (provide a patch with --depth 1, clone a selective branch,
> etc.). We've had that discussion before. I know you won't accept
> rational arguments. :)
>
> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Lucene/Solr git mirror will soon turn off

Posted by Upayavira <uv...@odoko.co.uk>.
As I said earlier - our history is inside the ASF SVN repo. The only way
our history would be lost would be if the whole repo was deleted, which
I suspect won't happen for a while. So even if we imported a snapshot
over to Git, our full SVN history is immutably stored in SVN (even if we
did svn rm on the whole tree).

Upayavira


On Fri, Dec 4, 2015, at 10:16 PM, Mark Miller wrote:
> Many old builds will also have problems even with a git checkout. If
> you actually wanted to try and build them it would be much more sane
> to work from the SVN history I'd hope we can retain.
>
> Mark
>
> On Fri, Dec 4, 2015 at 4:55 PM Robert Muir <rc...@gmail.com> wrote:
>> On Fri, Dec 4, 2015 at 4:25 PM, Dawid Weiss
>> <da...@gmail.com> wrote:
>>
>> I don't think jar files are 'history' and it was a mistake we had so
>>
>> many in source control before we cleaned that up. it is much better
>>
>> without them.
>>
>
>>
> Depends how you look at it. If your goal is to be able to actually
>>
> build ancient versions then dropping those JARs is going to be a real
>>
> pain. I think they should stay. Like I said, git is smart enough to
>>
> omit objects that aren't referenced from the cloned branch. The
>>
> conversion from SVN would have to be smart, but it's all doable.
>>
>>
I mentioned this same issue the last thread where we discussed that, I
>>
do recommend to try to actually compile these old versions.
>>
>>
As an experiment, I checked out the release tag for 4.2
>>
(http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_0)
>>
and ran 'ant compile'
>>
>>
BUILD FAILED
>>
/home/rmuir/lucene_solr_4_2_0/build.xml:107: The following error
>>
occurred while executing this line:
>>
/home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:656: The
>>
following error occurred while executing this line:
>>
/home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:479: The
>>
following error occurred while executing this line:
>>
/home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:1578: Class not
>>
found: javac1.8
>>
>>
That release was only 2 years ago, and its not the only problem you
>>
will hit. Besides build issues and stuff, I know at least Solr had a
>>
wildcard import, conflicting with the newly introduced
>>
java.util.Base64 that will prevent its compile. And I feel like there
>>
have been numerous sneaky generics issues that only Uwe seems to
>>
understand.
>>
>>
Being able to build the old versions would require a good effort just
>>
to figure out what build tools / compiler versions you need to do it
>>
for the different timeframes, and git hashes aren't great if you want
>>
to document that or try to make some fancy bisection tool.
>>
>>
---------------------------------------------------------------------
>>
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>
For additional commands, e-mail: dev-help@lucene.apache.org
>>
> --
> - Mark about.me/markrmiller

Re: Lucene/Solr git mirror will soon turn off

Posted by Mark Miller <ma...@gmail.com>.
Many old builds will also have problems even with a git checkout. If you
actually wanted to try and build them it would be much more sane to work
from the SVN history I'd hope we can retain.

Mark

On Fri, Dec 4, 2015 at 4:55 PM Robert Muir <rc...@gmail.com> wrote:

> On Fri, Dec 4, 2015 at 4:25 PM, Dawid Weiss <da...@gmail.com> wrote:
> >> I don't think jar files are 'history' and it was a mistake we had so
> >> many in source control before we cleaned that up. it is much better
> >> without them.
> >
> > Depends how you look at it. If your goal is to be able to actually
> > build ancient versions then dropping those JARs is going to be a real
> > pain. I think they should stay. Like I said, git is smart enough to
> > omit objects that aren't referenced from the cloned branch. The
> > conversion from SVN would have to be smart, but it's all doable.
>
> I mentioned this same issue the last thread where we discussed that, I
> do recommend to try to actually compile these old versions.
>
> As an experiment, I checked out the release tag for 4.2
> (http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_0)
> and ran 'ant compile'
>
> BUILD FAILED
> /home/rmuir/lucene_solr_4_2_0/build.xml:107: The following error
> occurred while executing this line:
> /home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:656: The
> following error occurred while executing this line:
> /home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:479: The
> following error occurred while executing this line:
> /home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:1578: Class not
> found: javac1.8
>
> That release was only 2 years ago, and its not the only problem you
> will hit. Besides build issues and stuff, I know at least Solr had a
> wildcard import, conflicting with the newly introduced
> java.util.Base64 that will prevent its compile. And I feel like there
> have been numerous sneaky generics issues that only Uwe seems to
> understand.
>
> Being able to build the old versions would require a good effort just
> to figure out what build tools / compiler versions you need to do it
> for the different timeframes, and git hashes aren't great if you want
> to document that or try to make some fancy bisection tool.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> --
- Mark
about.me/markrmiller

Re: Lucene/Solr git mirror will soon turn off

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Dec 4, 2015 at 4:25 PM, Dawid Weiss <da...@gmail.com> wrote:
>> I don't think jar files are 'history' and it was a mistake we had so
>> many in source control before we cleaned that up. it is much better
>> without them.
>
> Depends how you look at it. If your goal is to be able to actually
> build ancient versions then dropping those JARs is going to be a real
> pain. I think they should stay. Like I said, git is smart enough to
> omit objects that aren't referenced from the cloned branch. The
> conversion from SVN would have to be smart, but it's all doable.

I mentioned this same issue the last thread where we discussed that, I
do recommend to try to actually compile these old versions.

As an experiment, I checked out the release tag for 4.2
(http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_0)
and ran 'ant compile'

BUILD FAILED
/home/rmuir/lucene_solr_4_2_0/build.xml:107: The following error
occurred while executing this line:
/home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:656: The
following error occurred while executing this line:
/home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:479: The
following error occurred while executing this line:
/home/rmuir/lucene_solr_4_2_0/lucene/common-build.xml:1578: Class not
found: javac1.8

That release was only 2 years ago, and its not the only problem you
will hit. Besides build issues and stuff, I know at least Solr had a
wildcard import, conflicting with the newly introduced
java.util.Base64 that will prevent its compile. And I feel like there
have been numerous sneaky generics issues that only Uwe seems to
understand.

Being able to build the old versions would require a good effort just
to figure out what build tools / compiler versions you need to do it
for the different timeframes, and git hashes aren't great if you want
to document that or try to make some fancy bisection tool.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
> I don't think jar files are 'history' and it was a mistake we had so
> many in source control before we cleaned that up. it is much better
> without them.

Depends how you look at it. If your goal is to be able to actually
build ancient versions then dropping those JARs is going to be a real
pain. I think they should stay. Like I said, git is smart enough to
omit objects that aren't referenced from the cloned branch. The
conversion from SVN would have to be smart, but it's all doable.

> this bloats the repository, makes clone slow for someone new who just
> wants to check it out to work on it, etc.

No, not really. There is a dozen ways to do it without cloning the
full repo (provide a patch with --depth 1, clone a selective branch,
etc.). We've had that discussion before. I know you won't accept
rational arguments. :)

D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
Oh, nevermind -- I think I know why:

License
GNU Library or Lesser General Public License version 2.0 (LGPLv2)

D.

On Fri, Dec 4, 2015 at 10:33 PM, Dawid Weiss <da...@gmail.com> wrote:
> It'd be cool to actually reintegrate ancient CVS history as well (I
> think not all of it was moved to SVN).
>
> https://sourceforge.net/projects/lucene/
>
> D.
>
> On Fri, Dec 4, 2015 at 10:30 PM, Upayavira <uv...@odoko.co.uk> wrote:
>> Even if we moved to git and did an svn rm on
>> https://svn.apache.org/repos/asf/lucene/dev, the entire history of Lucene
>> would remain in the ASF Subversion repository. Nothing we can do to prevent
>> that!!
>>
>> Upayavira
>>
>> On Fri, Dec 4, 2015, at 09:26 PM, Gus Heck wrote:
>>
>> If we moved to git would a read only svn for older versions still exist? If
>> so no reason to keep any jars at all in git.
>>
>> On Dec 4, 2015 4:22 PM, "Robert Muir" <rc...@gmail.com> wrote:
>>
>> On Fri, Dec 4, 2015 at 4:14 PM, Dawid Weiss <da...@gmail.com> wrote:
>>>> [...] several GBs unless we remove those JARs from our history.
>>>
>>> 1) History is important, don't dump it.
>>
>> I don't think jar files are 'history' and it was a mistake we had so
>> many in source control before we cleaned that up. it is much better
>> without them.
>>
>> this bloats the repository, makes clone slow for someone new who just
>> wants to check it out to work on it, etc.
>>
>> I wouldn't be surprised if it contributes to the system resources
>> issue at hand: which impacts *real history*
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
It'd be cool to actually reintegrate ancient CVS history as well (I
think not all of it was moved to SVN).

https://sourceforge.net/projects/lucene/

D.

On Fri, Dec 4, 2015 at 10:30 PM, Upayavira <uv...@odoko.co.uk> wrote:
> Even if we moved to git and did an svn rm on
> https://svn.apache.org/repos/asf/lucene/dev, the entire history of Lucene
> would remain in the ASF Subversion repository. Nothing we can do to prevent
> that!!
>
> Upayavira
>
> On Fri, Dec 4, 2015, at 09:26 PM, Gus Heck wrote:
>
> If we moved to git would a read only svn for older versions still exist? If
> so no reason to keep any jars at all in git.
>
> On Dec 4, 2015 4:22 PM, "Robert Muir" <rc...@gmail.com> wrote:
>
> On Fri, Dec 4, 2015 at 4:14 PM, Dawid Weiss <da...@gmail.com> wrote:
>>> [...] several GBs unless we remove those JARs from our history.
>>
>> 1) History is important, don't dump it.
>
> I don't think jar files are 'history' and it was a mistake we had so
> many in source control before we cleaned that up. it is much better
> without them.
>
> this bloats the repository, makes clone slow for someone new who just
> wants to check it out to work on it, etc.
>
> I wouldn't be surprised if it contributes to the system resources
> issue at hand: which impacts *real history*
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Upayavira <uv...@odoko.co.uk>.
Even if we moved to git and did an svn rm on https://svn.apache.org/repos/asf/lucene/dev, the entire history of Lucene would remain in the ASF Subversion repository. Nothing we can do to prevent that!!

Upayavira

On Fri, Dec 4, 2015, at 09:26 PM, Gus Heck wrote:
> If we moved to git would a read only svn for older versions still
> exist? If so no reason to keep any jars at all in git.


> On Dec 4, 2015 4:22 PM, "Robert Muir" <rc...@gmail.com> wrote:
>> On Fri, Dec 4, 2015 at 4:14 PM, Dawid Weiss
>> <da...@gmail.com> wrote:
>>
>> [...] several GBs unless we remove those JARs from our history.
>>
>
>>
> 1) History is important, don't dump it.
>>
>>
I don't think jar files are 'history' and it was a mistake we had so
>>
many in source control before we cleaned that up. it is much better
>>
without them.
>>
>>
this bloats the repository, makes clone slow for someone new who just
>>
wants to check it out to work on it, etc.
>>
>>
I wouldn't be surprised if it contributes to the system resources
>>
issue at hand: which impacts *real history*
>>
>>
---------------------------------------------------------------------
>>
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>
For additional commands, e-mail: dev-help@lucene.apache.org
>>

Re: Lucene/Solr git mirror will soon turn off

Posted by Gus Heck <gu...@gmail.com>.
If we moved to git would a read only svn for older versions still exist? If
so no reason to keep any jars at all in git.
On Dec 4, 2015 4:22 PM, "Robert Muir" <rc...@gmail.com> wrote:

> On Fri, Dec 4, 2015 at 4:14 PM, Dawid Weiss <da...@gmail.com> wrote:
> >> [...] several GBs unless we remove those JARs from our history.
> >
> > 1) History is important, don't dump it.
>
> I don't think jar files are 'history' and it was a mistake we had so
> many in source control before we cleaned that up. it is much better
> without them.
>
> this bloats the repository, makes clone slow for someone new who just
> wants to check it out to work on it, etc.
>
> I wouldn't be surprised if it contributes to the system resources
> issue at hand: which impacts *real history*
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Lucene/Solr git mirror will soon turn off

Posted by Robert Muir <rc...@gmail.com>.
On Fri, Dec 4, 2015 at 4:14 PM, Dawid Weiss <da...@gmail.com> wrote:
>> [...] several GBs unless we remove those JARs from our history.
>
> 1) History is important, don't dump it.

I don't think jar files are 'history' and it was a mistake we had so
many in source control before we cleaned that up. it is much better
without them.

this bloats the repository, makes clone slow for someone new who just
wants to check it out to work on it, etc.

I wouldn't be surprised if it contributes to the system resources
issue at hand: which impacts *real history*

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Dawid Weiss <da...@gmail.com>.
> [...] several GBs unless we remove those JARs from our history.

1) History is important, don't dump it.
2) git isn't dumb -- git clone -b master --single-branch would only
fetch what's actually needed/ referenced. We could split the history
into "pre-ivy" and "post-ivy" branches so that fetching master is at
nearly no-cost, but if somebody wishes to they can still fetch
everything (I would, it's a one-time thing, typically).

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Lucene/Solr git mirror will soon turn off

Posted by Mike Drob <md...@apache.org>.
> Does anyone know of a link to this git-svn issue?  Is it a known
issue?  If there's something simple we can do (remove old jars from
our svn history, remove old branches), maybe we can sidestep the issue
and infra will allow it to keep running?

I believe it is partially covered under
https://issues.apache.org/jira/browse/INFRA-9182

On Fri, Dec 4, 2015 at 2:57 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
>

Re: Lucene/Solr git mirror will soon turn off

Posted by Geoffrey Corey <ge...@gmail.com>.
If you pull from aaf git (git.a.o) or github, you are not using git-svn at
all, bypassing the actual git-svn problem.

Check out
https://github.com/apache/infrastructure-puppet/tree/deployment/modules/git_mirror_asf
for what we use, specifically the update-mirror.sh script. That is was
svn2gitupdate runs when a pubsub event happens.
On Dec 8, 2015 1:00 PM, "Upayavira" <uv...@odoko.co.uk> wrote:

> Here's what I've just got on the Infra hipchat channel:
>
> The ASF has a tool, svn2gitupdate[1], which I presume uses git-svn, which
> fails periodically. When it does fail, it takes with it all other ASF
> projects that make use of the same tool, until an admin can intervene and
> restart things.
>
> When it fails, it OOMs, and blocks all disk activity.
>
> If someone wanted to reproduce this issue, you could:
>  * create a 4Gb VM
>  * Install svn2gitupdate from [1]
>  * Clone the Lucene git repo from ASF git or github
>  * Run the tool repeatedly until it fails
>    - it is the pull from SVN that fails, not the push to git, so we don't
> need a remote git server
>
> The other option is just switching to Git. Now, given the issue is with
> reading from SVN, not writing to Git, Infrastructure *would* be able to
> give us a decent SVN->Git export - even if they had to rerun the process a
> number of times, this would be acceptable as a one-off task.
>
> So it seems we have two options:
> 1) Set up a VM and debug reading from SVN
> 2) Just migrate to Git and be done with it.
>
> Thoughts? Volunteers?
>
> Upayavira
>
> [1]
> https://svn.apache.org/repos/infra/infrastructure/trunk/projects/git/svn2gitupdate/
>
>
> On Tue, Dec 8, 2015, at 08:49 PM, Geoffrey Corey wrote:
>
> If you do that, then the changes do not sync to github, and there's a 99%
> chance that the next time a change is seen by the mirroring process (or by
> the hourly cron that updates all the svn->git mirrors) the same memory leak
> would happen.
>
> On Tue, Dec 8, 2015 at 12:40 PM, Scott Blum <dr...@gmail.com> wrote:
>
> Dumb question, but searching around suggests that git-svn can be killed
> and then resumed with `git svn fetch`.  Shouldn't that resolve any
> process-level memory leak?
>
> On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
> Hello devs,
>
> The infra team has notified us (Lucene/Solr) that in 26 days our
> git-svn mirror will be turned off, because running it consumes too
> many system resources, affecting other projects, apparently because of
> a memory leak in git-svn.
>
> Does anyone know of a link to this git-svn issue?  Is it a known
> issue?  If there's something simple we can do (remove old jars from
> our svn history, remove old branches), maybe we can sidestep the issue
> and infra will allow it to keep running?
>
> Or maybe someone in the Lucene/Solr dev community with prior
> experience with git-svn could volunteer to play with it to see if
> there's a viable solution, maybe with command-line options e.g. to
> only mirror specific branches (trunk, 5.x)?
>
> Or maybe it's time for us to switch to git, but there are problems
> there too, e.g. we are currently missing large parts of our svn
> history from the mirror now and it's not clear whether that would be
> fixed if we switched:
> https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
> used to add JAR files to svn, the "git clone" would likely take
> several GBs unless we remove those JARs from our history.
>
> Or if anyone has any other ideas, we should explore them, because
> otherwise in 26 days there will be no more updates to the git mirror
> of Lucene and Solr sources...
>
> Thanks,
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>

Re: Lucene/Solr git mirror will soon turn off

Posted by Upayavira <uv...@odoko.co.uk>.
Here's what I've just got on the Infra hipchat channel:

The ASF has a tool, svn2gitupdate[1], which I presume uses git-svn,
which fails periodically. When it does fail, it takes with it all other
ASF projects that make use of the same tool, until an admin can
intervene and restart things.

When it fails, it OOMs, and blocks all disk activity.

If someone wanted to reproduce this issue, you could:�* create a 4Gb VM
* Install svn2gitupdate from [1]�* Clone the Lucene git repo from ASF
git or github�* Run the tool repeatedly until it fails�� - it is the
pull from SVN that fails, not the push to git, so we don't need a remote
git server


The other option is just switching to Git. Now, given the issue is with
reading from SVN, not writing to Git, Infrastructure *would* be able to
give us a decent SVN->Git export - even if they had to rerun the process
a number of times, this would be acceptable as a one-off task.

So it seems we have two options:
1) Set up a VM and debug reading from SVN
2) Just migrate to Git and be done with it.

Thoughts? Volunteers?

Upayavira

[1] https://svn.apache.org/repos/infra/infrastructure/trunk/projects/git/svn2gitupdate/


On Tue, Dec 8, 2015, at 08:49 PM, Geoffrey Corey wrote:
> If you do that, then the changes do not sync to github, and there's a
> 99% chance that the next time a change is seen by the mirroring
> process (or by the hourly cron that updates all the svn->git mirrors)
> the same memory leak would happen.
>
> On Tue, Dec 8, 2015 at 12:40 PM, Scott Blum
> <dr...@gmail.com> wrote:
>> Dumb question, but searching around suggests that git-svn can be
>> killed and then resumed with `git svn fetch`.� Shouldn't that resolve
>> any process-level memory leak?
>>
>> On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> Hello devs,
>>>
>>>
The infra team has notified us (Lucene/Solr) that in 26 days our
>>>
git-svn mirror will be turned off, because running it consumes too
>>>
many system resources, affecting other projects, apparently because of
>>>
a memory leak in git-svn.
>>>
>>>
Does anyone know of a link to this git-svn issue?� Is it a known
>>>
issue?� If there's something simple we can do (remove old jars from
>>>
our svn history, remove old branches), maybe we can sidestep the issue
>>>
and infra will allow it to keep running?
>>>
>>>
Or maybe someone in the Lucene/Solr dev community with prior
>>>
experience with git-svn could volunteer to play with it to see if
>>>
there's a viable solution, maybe with command-line options e.g. to
>>>
only mirror specific branches (trunk, 5.x)?
>>>
>>>
Or maybe it's time for us to switch to git, but there are problems
>>>
there too, e.g. we are currently missing large parts of our svn
>>>
history from the mirror now and it's not clear whether that would be
>>>
fixed if we switched:
>>> https://issues.apache.org/jira/browse/INFRA-10828� Also, because we
>>>
used to add JAR files to svn, the "git clone" would likely take
>>>
several GBs unless we remove those JARs from our history.
>>>
>>>
Or if anyone has any other ideas, we should explore them, because
>>>
otherwise in 26 days there will be no more updates to the git mirror
>>>
of Lucene and Solr sources...
>>>
>>>
Thanks,
>>>
>>>
Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
---------------------------------------------------------------------
>>>
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Lucene/Solr git mirror will soon turn off

Posted by Geoffrey Corey <co...@apache.org>.
If you do that, then the changes do not sync to github, and there's a 99%
chance that the next time a change is seen by the mirroring process (or by
the hourly cron that updates all the svn->git mirrors) the same memory leak
would happen.

On Tue, Dec 8, 2015 at 12:40 PM, Scott Blum <dr...@gmail.com> wrote:

> Dumb question, but searching around suggests that git-svn can be killed
> and then resumed with `git svn fetch`.  Shouldn't that resolve any
> process-level memory leak?
>
> On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Hello devs,
>>
>> The infra team has notified us (Lucene/Solr) that in 26 days our
>> git-svn mirror will be turned off, because running it consumes too
>> many system resources, affecting other projects, apparently because of
>> a memory leak in git-svn.
>>
>> Does anyone know of a link to this git-svn issue?  Is it a known
>> issue?  If there's something simple we can do (remove old jars from
>> our svn history, remove old branches), maybe we can sidestep the issue
>> and infra will allow it to keep running?
>>
>> Or maybe someone in the Lucene/Solr dev community with prior
>> experience with git-svn could volunteer to play with it to see if
>> there's a viable solution, maybe with command-line options e.g. to
>> only mirror specific branches (trunk, 5.x)?
>>
>> Or maybe it's time for us to switch to git, but there are problems
>> there too, e.g. we are currently missing large parts of our svn
>> history from the mirror now and it's not clear whether that would be
>> fixed if we switched:
>> https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
>> used to add JAR files to svn, the "git clone" would likely take
>> several GBs unless we remove those JARs from our history.
>>
>> Or if anyone has any other ideas, we should explore them, because
>> otherwise in 26 days there will be no more updates to the git mirror
>> of Lucene and Solr sources...
>>
>> Thanks,
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: Lucene/Solr git mirror will soon turn off

Posted by Scott Blum <dr...@gmail.com>.
Dumb question, but searching around suggests that git-svn can be killed and
then resumed with `git svn fetch`.  Shouldn't that resolve any
process-level memory leak?

On Fri, Dec 4, 2015 at 3:57 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hello devs,
>
> The infra team has notified us (Lucene/Solr) that in 26 days our
> git-svn mirror will be turned off, because running it consumes too
> many system resources, affecting other projects, apparently because of
> a memory leak in git-svn.
>
> Does anyone know of a link to this git-svn issue?  Is it a known
> issue?  If there's something simple we can do (remove old jars from
> our svn history, remove old branches), maybe we can sidestep the issue
> and infra will allow it to keep running?
>
> Or maybe someone in the Lucene/Solr dev community with prior
> experience with git-svn could volunteer to play with it to see if
> there's a viable solution, maybe with command-line options e.g. to
> only mirror specific branches (trunk, 5.x)?
>
> Or maybe it's time for us to switch to git, but there are problems
> there too, e.g. we are currently missing large parts of our svn
> history from the mirror now and it's not clear whether that would be
> fixed if we switched:
> https://issues.apache.org/jira/browse/INFRA-10828  Also, because we
> used to add JAR files to svn, the "git clone" would likely take
> several GBs unless we remove those JARs from our history.
>
> Or if anyone has any other ideas, we should explore them, because
> otherwise in 26 days there will be no more updates to the git mirror
> of Lucene and Solr sources...
>
> Thanks,
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>