You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by Julien Nioche <li...@gmail.com> on 2011/08/09 17:10:12 UTC

Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Hi Kirby,

Grumble, Grumble.  (adding dev@nutch, as that is more than likely
> where this discussion really belongs)...
>

am adding gora-dev@incubator.apache.org as well


> It'd be really nice if folks could just follow the commands in the
> nightly build, and get a build pushed out.  I've pointed this out
> previously, and was told this would be fixed "shortly" (right after
> GORA-0.1 finally got released, but not published in public maven repo,
> which as far as I know, it still isn't published, but I stopped
> checking on it).
>

I understand and share your frustration, however you need to bear in mind
that things are done only if people volunteer and have time - usually taken
from their holiday, weekends, evenings. Chris (who is the de facto release
master for Nutch and Gora) has not had the time and nobody else has
volunteered to do it.


> As it happens, yesterday was the 1 year anniversary of the last
> successful Hudson/Jenkins build...  If that actually worked, we could
> point people towards it as a useful recipe for how to get a build
> working off trunk.  I haven't been following Nutch too closely, but it
> always strikes me as really odd, that there's a nightly build and it
> doesn't bother anybody that it fails all the time (and that there
> isn't a nightly build for the stable branches).
>

The real issue behind all this is what we should do with Nutch 2.0. What
follows is only my opinion and I would love to hear what others have to say
on this subject.

Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
Gora, the latter hasn't really taken off since incubation. There have been
some modest contributions to it but it does not seem to be used much and
there is virtually nothing happening on it in terms of development. More
worryingly, the people who initially contributed to it are not very active
on the project (such is life, new jobs, different projects, etc...)
anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
months : we still have the same bugs, the tests do not work, the build has
to be done manually etc...

At the same time, there has been a new lease of life into Nutch as a whole :
there is definitely more activity on the mailing lists, new users, new
active committers  etc... and quite a few bugfixes and improvements - most
of them backported from what had been done in the trunk and people seem
fairly happy with what we can do with 1.4

So the question is : what shall we do with 2.0? Here are a few possibilities
:

a) put some effort into it, fix the bugs and make so that it can be used
instead of 1.x
b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
again
c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
branches is quite a pain)
d) abandon the idea of a neutral storage layer with Gora and hardwire it to
e.g. HBase

Option (a) has not happened in the last 12 months and I am not very hopeful
about it.

What do you guys think?

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi,

Without changing the flow of conversation and the points which have already
been touched upon, I would like to add:

I am really split here between a couple of decisions. I like the abstraction
that Gora provides, even though it is somewhat of a pain to configure, this
also presents a barrier to adoption for dev's. This being said, Gora is a
fundamental component for Nutch 2.0 and once you get to grips with the
config and the flexibility which it offers you are then presented with an
excellent setup for Nutch 2.0. I understand people's concerns and why they
would wish to hardwire to HBase however I would like to point to a (rather
lengthy) thread I found last night as I was thinking about my position in
this whole affair [1]. In essence this reflects exactly what Julien has
mentioned below as well as adding a hellish lot more! I am also with Markus
on this one, however there is also no point in me being anything other than
totally honest, some of the bugs in trunk 2.0 we are talking about are
pretty substantial (I don't even know them all), especially when the API
changes are taken into account, therefore I would be learning as I chipped
in my part... this would inevitably lead to slower progression on Nutch 2.0
than we all would hope for. Bearing in mind several dev's other commitments
both in and out of the ASF. Is this something which can be tolerated or are
we to put suggestions in place which adhere to the release early release
often ethos and try to get something out of the door. If we could get an
official release for Nutch 2.0 then it would mean community testing could
commence and instead of improvement suggestions resulting within JIRA
tickets we would be getting bugs specifically for 2.0 as independent issues,
this would inevitably lead to a better trunk development environment for us
all. One inverse aspect of veering towards option A) is that we had a small
amount of resistance when Nutch 1.3 was release... would making Nutch 2.0
mainstream, the de facto for Nutch users be a step too far for some of them?

I am a firm believer that we should do whatever necessary to get trunk
building under Hudson. It seems like a waste of resources that we have the
potential to have a stable build environment but it is not being taken
advantage of. Obviously I am unaware of exactly what is preventing this,
hence my keenness to get it sorted out, but surely we all must agree that
this would be beneficial, from a mental point of view as well. If we see
that trunk is building successfully then there might be a better feeling
about people developing not only on trunk 2.0 but also on Gora and other
components upon which trunk 2.0 depends.

Further to this, is there any consensus to get a jenkins build established
for branch 1.X? It is quite clear that this is our working development
strand therefore would this not make sense? I have been looking through the
wiki [2] and any committer can get it set up once the PMC chair makes some
minor requests on people.apache,org

Finally, with regards to the ant/ivy configuration, I am quite happy with
the current set up, if someone puts forward a reasonable argument for
changing to ant/maven or any other configuration then I will certainly be
interested if it adds value to the project. I must agree that changing
something which is not broken is far from the direction I had envisaged we
were moving... quite the opposite infact.

[1] http://www.mail-archive.com/dev@nutch.apache.org/msg00216.html
[2] http://wiki.apache.org/general/Hudson


On Wed, Aug 10, 2011 at 10:20 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Julien, devs, users,
>
> I'd like to see bugs fixed in 2.0 but some of them are way out of my league
> or
> would cost me an absurd amount of time. I'd also really like to use Gora
> but
> Gora must be maintained. Gora will play a fundamental role in 2.0 and if
> something is broken there it is not trivial to fix it for us Nutch devs as
> it
> is yet another component to worry about.
>
> Tika goes well, it's worked on and there is good enough progress to rely on
> from our perspective. If this is not going to be the case with Gora we
> should
> maybe decide to drop it and hardwire HBASE in it.
>
> Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not
> sure the currently active Nutch devs are going to fix it just like that.
>
> Cheers,
>
>
> >
> > a) put some effort into it, fix the bugs and make so that it can be used
> > instead of 1.x
> > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> trunk
> > again
> > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> > branches is quite a pain)
> > d) abandon the idea of a neutral storage layer with Gora and hardwire it
> to
> > e.g. HBase
> >
> > Option (a) has not happened in the last 12 months and I am not very
> hopeful
> > about it.
> >
> > What do you guys think?
> >
> > Julien
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
*Lewis*

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi,

Without changing the flow of conversation and the points which have already
been touched upon, I would like to add:

I am really split here between a couple of decisions. I like the abstraction
that Gora provides, even though it is somewhat of a pain to configure, this
also presents a barrier to adoption for dev's. This being said, Gora is a
fundamental component for Nutch 2.0 and once you get to grips with the
config and the flexibility which it offers you are then presented with an
excellent setup for Nutch 2.0. I understand people's concerns and why they
would wish to hardwire to HBase however I would like to point to a (rather
lengthy) thread I found last night as I was thinking about my position in
this whole affair [1]. In essence this reflects exactly what Julien has
mentioned below as well as adding a hellish lot more! I am also with Markus
on this one, however there is also no point in me being anything other than
totally honest, some of the bugs in trunk 2.0 we are talking about are
pretty substantial (I don't even know them all), especially when the API
changes are taken into account, therefore I would be learning as I chipped
in my part... this would inevitably lead to slower progression on Nutch 2.0
than we all would hope for. Bearing in mind several dev's other commitments
both in and out of the ASF. Is this something which can be tolerated or are
we to put suggestions in place which adhere to the release early release
often ethos and try to get something out of the door. If we could get an
official release for Nutch 2.0 then it would mean community testing could
commence and instead of improvement suggestions resulting within JIRA
tickets we would be getting bugs specifically for 2.0 as independent issues,
this would inevitably lead to a better trunk development environment for us
all. One inverse aspect of veering towards option A) is that we had a small
amount of resistance when Nutch 1.3 was release... would making Nutch 2.0
mainstream, the de facto for Nutch users be a step too far for some of them?

I am a firm believer that we should do whatever necessary to get trunk
building under Hudson. It seems like a waste of resources that we have the
potential to have a stable build environment but it is not being taken
advantage of. Obviously I am unaware of exactly what is preventing this,
hence my keenness to get it sorted out, but surely we all must agree that
this would be beneficial, from a mental point of view as well. If we see
that trunk is building successfully then there might be a better feeling
about people developing not only on trunk 2.0 but also on Gora and other
components upon which trunk 2.0 depends.

Further to this, is there any consensus to get a jenkins build established
for branch 1.X? It is quite clear that this is our working development
strand therefore would this not make sense? I have been looking through the
wiki [2] and any committer can get it set up once the PMC chair makes some
minor requests on people.apache,org

Finally, with regards to the ant/ivy configuration, I am quite happy with
the current set up, if someone puts forward a reasonable argument for
changing to ant/maven or any other configuration then I will certainly be
interested if it adds value to the project. I must agree that changing
something which is not broken is far from the direction I had envisaged we
were moving... quite the opposite infact.

[1] http://www.mail-archive.com/dev@nutch.apache.org/msg00216.html
[2] http://wiki.apache.org/general/Hudson


On Wed, Aug 10, 2011 at 10:20 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Julien, devs, users,
>
> I'd like to see bugs fixed in 2.0 but some of them are way out of my league
> or
> would cost me an absurd amount of time. I'd also really like to use Gora
> but
> Gora must be maintained. Gora will play a fundamental role in 2.0 and if
> something is broken there it is not trivial to fix it for us Nutch devs as
> it
> is yet another component to worry about.
>
> Tika goes well, it's worked on and there is good enough progress to rely on
> from our perspective. If this is not going to be the case with Gora we
> should
> maybe decide to drop it and hardwire HBASE in it.
>
> Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not
> sure the currently active Nutch devs are going to fix it just like that.
>
> Cheers,
>
>
> >
> > a) put some effort into it, fix the bugs and make so that it can be used
> > instead of 1.x
> > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> trunk
> > again
> > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> > branches is quite a pain)
> > d) abandon the idea of a neutral storage layer with Gora and hardwire it
> to
> > e.g. HBase
> >
> > Option (a) has not happened in the last 12 months and I am not very
> hopeful
> > about it.
> >
> > What do you guys think?
> >
> > Julien
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
*Lewis*

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Markus Jelsma <ma...@openindex.io>.
Julien, devs, users,

I'd like to see bugs fixed in 2.0 but some of them are way out of my league or 
would cost me an absurd amount of time. I'd also really like to use Gora but 
Gora must be maintained. Gora will play a fundamental role in 2.0 and if 
something is broken there it is not trivial to fix it for us Nutch devs as it 
is yet another component to worry about.

Tika goes well, it's worked on and there is good enough progress to rely on 
from our perspective. If this is not going to be the case with Gora we should 
maybe decide to drop it and hardwire HBASE in it.

Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not 
sure the currently active Nutch devs are going to fix it just like that.

Cheers,

On Tuesday 09 August 2011 17:10:12 Julien Nioche wrote:
> Hi Kirby,
> 
> Grumble, Grumble.  (adding dev@nutch, as that is more than likely
> 
> > where this discussion really belongs)...
> 
> am adding gora-dev@incubator.apache.org as well
> 
> > It'd be really nice if folks could just follow the commands in the
> > nightly build, and get a build pushed out.  I've pointed this out
> > previously, and was told this would be fixed "shortly" (right after
> > GORA-0.1 finally got released, but not published in public maven repo,
> > which as far as I know, it still isn't published, but I stopped
> > checking on it).
> 
> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.
> 
> > As it happens, yesterday was the 1 year anniversary of the last
> > successful Hudson/Jenkins build...  If that actually worked, we could
> > point people towards it as a useful recipe for how to get a build
> > working off trunk.  I haven't been following Nutch too closely, but it
> > always strikes me as really odd, that there's a nightly build and it
> > doesn't bother anybody that it fails all the time (and that there
> > isn't a nightly build for the stable branches).
> 
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
> 
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...
> 
> At the same time, there has been a new lease of life into Nutch as a whole
> : there is definitely more activity on the mailing lists, new users, new
> active committers  etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4
> 
> So the question is : what shall we do with 2.0? Here are a few
> possibilities
> 
> 
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> again
> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
> 
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
> 
> What do you guys think?
> 
> Julien

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Julien Nioche <li...@gmail.com>.
Hi Tom,


>  I have been using Nutch 1.x for the last 9 months or so and it works well
> for large scale crawls up to around a billion pages. However, the inherent
> lack of random access in HDFS really starts to become a burden on our hadoop
> cluster when going through the whole generate/update/fetch cycle. Being able
> to circumvent HDFS and store data directly in Cassandra/HBase/SQL via GORA
> is an exciting development in Nutch 2, so I have an interest in making it
> succeed.
>

I assume that you are referring to the fact that after a while the
generation and update steps end up taking most of the time compared to the
fetching / parsing. One way around this is to generate multiple segments in
a single generate and update them all with the crawldb in one go, see the
options for the Generator.


>
>
> That said, I too, have been frustrated by the state of affairs on Nutch 2.
> I am willing to help.
>

Good to hear that.


> I see that Nutch is mainly an ant/ivy build process, but  there is an
> attempt at using Maven? IMO, ant/ivy seems a bit dated and I am really much
> more comfortable working with Maven. Would there be an interest in
> completely moving to Maven as the build tool of choice?
>

[Oh no, one of these endless discussions again :-( ] The consensus among the
people actively involved in the project was that ANT+IVY was a better option
than plain Maven, due notably to the fact that the ANT scripts were already
written and the effort could be used in a more fruitful way doing something
else. There are comments on the mailing lists from people who are used to
Maven but some of them seem to be happy with the pom file used to publish
the artefacts, while others end up using IvyDE for Eclipse and the ANT
scripts and realise that it works fine. I don't think that Ivy is dated at
all and, again, would rather see people contributing useful code instead of
spending time trying to fix things that are not broken.

I'd personally be completely against using Maven on its own but would
consider ANT+MAVEN tasks for managing the modules + dependencies and the
publication of artefacts. We currently have Ivy for the dependencies and
modules and Maven for the publication, using the Maven tasks could be used
for both and would simplify things a little bit while preserving most of the
ANT script. As usual suggestions and contributions are welcome.

Julien

RE: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Tom Davidson <td...@covario.com>.
Hi All,

I have been using Nutch 1.x for the last 9 months or so and it works well for large scale crawls up to around a billion pages. However, the inherent lack of random access in HDFS really starts to become a burden on our hadoop cluster when going through the whole generate/update/fetch cycle. Being able to circumvent HDFS and store data directly in Cassandra/HBase/SQL via GORA is an exciting development in Nutch 2, so I have an interest in making it succeed.

That said, I too, have been frustrated by the state of affairs on Nutch 2.  I am willing to help. I see that Nutch is mainly an ant/ivy build process, but  there is an attempt at using Maven? IMO, ant/ivy seems a bit dated and I am really much more comfortable working with Maven. Would there be an interest in completely moving to Maven as the build tool of choice?

From: Kirby Bohling [mailto:kirby.bohling@gmail.com]
Sent: Tuesday, August 09, 2011 8:31 AM
To: dev@nutch.apache.org
Cc: gora-dev@incubator.apache.org
Subject: Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Julien,

On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche <li...@gmail.com>> wrote:
Hi Kirby,

Grumble, Grumble.  (adding dev@nutch, as that is more than likely
where this discussion really belongs)...

am adding gora-dev@incubator.apache.org<ma...@incubator.apache.org> as well

It'd be really nice if folks could just follow the commands in the
nightly build, and get a build pushed out.  I've pointed this out
previously, and was told this would be fixed "shortly" (right after
GORA-0.1 finally got released, but not published in public maven repo,
which as far as I know, it still isn't published, but I stopped
checking on it).

I understand and share your frustration, however you need to bear in mind that things are done only if people volunteer and have time - usually taken from their holiday, weekends, evenings. Chris (who is the de facto release master for Nutch and Gora) has not had the time and nobody else has volunteered to do it.

   I don't mean to be a complainer, I'd happily try and contribute fixes on this one, but most of this would likely have to be done on Hudson/Jenkins.  I think you're addressing a larger issue than I really meant.  My point was, somehow a developer does a build on their desktop, and however that is done should be duplicated on Hudson/Jenkins.  If you need the trunk of gora, then is it possible to checkout it out, build it and install it to a local repo, and then build Nutch via Hudson/Jenkins?  Whatever it takes to get a build should be what the CI server is doing.  The repeatable, but failing builds is what really confuses and frustrates me.  The nightly/CI build should be automating what devs on their desktop to ensure it'll work on a clean setup.  Right now, it just tells you that for the last year, the totally obvious steps will lead to a failure.

   I can figure out all of the configuration issues for Hudson/Jenkins to make it work, if somebody can push that into the Apache version.  However, I think answering your questions first would be a good idea.  My totally non-binding +1 for setting up a CI/Nightly build for the various stable branches too, the only one I found on Apache was for trunk.

As it happens, yesterday was the 1 year anniversary of the last
successful Hudson/Jenkins build...  If that actually worked, we could
point people towards it as a useful recipe for how to get a build
working off trunk.  I haven't been following Nutch too closely, but it
always strikes me as really odd, that there's a nightly build and it
doesn't bother anybody that it fails all the time (and that there
isn't a nightly build for the stable branches).

The real issue behind all this is what we should do with Nutch 2.0. What follows is only my opinion and I would love to hear what others have to say on this subject.

Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to Gora, the latter hasn't really taken off since incubation. There have been some modest contributions to it but it does not seem to be used much and there is virtually nothing happening on it in terms of development. More worryingly, the people who initially contributed to it are not very active on the project (such is life, new jobs, different projects, etc...) anymore*. As for Nutch 2.0, it hasn't made any progress in  the last 12 months : we still have the same bugs, the tests do not work, the build has to be done manually etc...

At the same time, there has been a new lease of life into Nutch as a whole : there is definitely more activity on the mailing lists, new users, new active committers  etc... and quite a few bugfixes and improvements - most of them backported from what had been done in the trunk and people seem fairly happy with what we can do with 1.4

So the question is : what shall we do with 2.0? Here are a few possibilities :

a) put some effort into it, fix the bugs and make so that it can be used instead of 1.x
b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk again
c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two branches is quite a pain)
d) abandon the idea of a neutral storage layer with Gora and hardwire it to e.g. HBase

Option (a) has not happened in the last 12 months and I am not very hopeful about it.

What do you guys think?

   I know nothing about the 2.0 branch, and can't really contribute to that conversation (that job issue interferes will all my free time).

    Kirby

Julien

--
Error! Filename not specified.
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com


RE: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Tom Davidson <td...@covario.com>.
Hi All,

I have been using Nutch 1.x for the last 9 months or so and it works well for large scale crawls up to around a billion pages. However, the inherent lack of random access in HDFS really starts to become a burden on our hadoop cluster when going through the whole generate/update/fetch cycle. Being able to circumvent HDFS and store data directly in Cassandra/HBase/SQL via GORA is an exciting development in Nutch 2, so I have an interest in making it succeed.

That said, I too, have been frustrated by the state of affairs on Nutch 2.  I am willing to help. I see that Nutch is mainly an ant/ivy build process, but  there is an attempt at using Maven? IMO, ant/ivy seems a bit dated and I am really much more comfortable working with Maven. Would there be an interest in completely moving to Maven as the build tool of choice?

From: Kirby Bohling [mailto:kirby.bohling@gmail.com]
Sent: Tuesday, August 09, 2011 8:31 AM
To: dev@nutch.apache.org
Cc: gora-dev@incubator.apache.org
Subject: Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Julien,

On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche <li...@gmail.com>> wrote:
Hi Kirby,

Grumble, Grumble.  (adding dev@nutch, as that is more than likely
where this discussion really belongs)...

am adding gora-dev@incubator.apache.org<ma...@incubator.apache.org> as well

It'd be really nice if folks could just follow the commands in the
nightly build, and get a build pushed out.  I've pointed this out
previously, and was told this would be fixed "shortly" (right after
GORA-0.1 finally got released, but not published in public maven repo,
which as far as I know, it still isn't published, but I stopped
checking on it).

I understand and share your frustration, however you need to bear in mind that things are done only if people volunteer and have time - usually taken from their holiday, weekends, evenings. Chris (who is the de facto release master for Nutch and Gora) has not had the time and nobody else has volunteered to do it.

   I don't mean to be a complainer, I'd happily try and contribute fixes on this one, but most of this would likely have to be done on Hudson/Jenkins.  I think you're addressing a larger issue than I really meant.  My point was, somehow a developer does a build on their desktop, and however that is done should be duplicated on Hudson/Jenkins.  If you need the trunk of gora, then is it possible to checkout it out, build it and install it to a local repo, and then build Nutch via Hudson/Jenkins?  Whatever it takes to get a build should be what the CI server is doing.  The repeatable, but failing builds is what really confuses and frustrates me.  The nightly/CI build should be automating what devs on their desktop to ensure it'll work on a clean setup.  Right now, it just tells you that for the last year, the totally obvious steps will lead to a failure.

   I can figure out all of the configuration issues for Hudson/Jenkins to make it work, if somebody can push that into the Apache version.  However, I think answering your questions first would be a good idea.  My totally non-binding +1 for setting up a CI/Nightly build for the various stable branches too, the only one I found on Apache was for trunk.

As it happens, yesterday was the 1 year anniversary of the last
successful Hudson/Jenkins build...  If that actually worked, we could
point people towards it as a useful recipe for how to get a build
working off trunk.  I haven't been following Nutch too closely, but it
always strikes me as really odd, that there's a nightly build and it
doesn't bother anybody that it fails all the time (and that there
isn't a nightly build for the stable branches).

The real issue behind all this is what we should do with Nutch 2.0. What follows is only my opinion and I would love to hear what others have to say on this subject.

Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to Gora, the latter hasn't really taken off since incubation. There have been some modest contributions to it but it does not seem to be used much and there is virtually nothing happening on it in terms of development. More worryingly, the people who initially contributed to it are not very active on the project (such is life, new jobs, different projects, etc...) anymore*. As for Nutch 2.0, it hasn't made any progress in  the last 12 months : we still have the same bugs, the tests do not work, the build has to be done manually etc...

At the same time, there has been a new lease of life into Nutch as a whole : there is definitely more activity on the mailing lists, new users, new active committers  etc... and quite a few bugfixes and improvements - most of them backported from what had been done in the trunk and people seem fairly happy with what we can do with 1.4

So the question is : what shall we do with 2.0? Here are a few possibilities :

a) put some effort into it, fix the bugs and make so that it can be used instead of 1.x
b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk again
c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two branches is quite a pain)
d) abandon the idea of a neutral storage layer with Gora and hardwire it to e.g. HBase

Option (a) has not happened in the last 12 months and I am not very hopeful about it.

What do you guys think?

   I know nothing about the 2.0 branch, and can't really contribute to that conversation (that job issue interferes will all my free time).

    Kirby

Julien

--
Error! Filename not specified.
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com


Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Kirby Bohling <ki...@gmail.com>.
Julien,


On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> Hi Kirby,
>
> Grumble, Grumble.  (adding dev@nutch, as that is more than likely
>> where this discussion really belongs)...
>>
>
> am adding gora-dev@incubator.apache.org as well
>
>
>> It'd be really nice if folks could just follow the commands in the
>> nightly build, and get a build pushed out.  I've pointed this out
>> previously, and was told this would be fixed "shortly" (right after
>> GORA-0.1 finally got released, but not published in public maven repo,
>> which as far as I know, it still isn't published, but I stopped
>> checking on it).
>>
>
> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.
>

   I don't mean to be a complainer, I'd happily try and contribute fixes on
this one, but most of this would likely have to be done on Hudson/Jenkins.
I think you're addressing a larger issue than I really meant.  My point was,
somehow a developer does a build on their desktop, and however that is done
should be duplicated on Hudson/Jenkins.  If you need the trunk of gora, then
is it possible to checkout it out, build it and install it to a local repo,
and then build Nutch via Hudson/Jenkins?  Whatever it takes to get a build
should be what the CI server is doing.  The repeatable, but failing builds
is what really confuses and frustrates me.  The nightly/CI build should be
automating what devs on their desktop to ensure it'll work on a clean
setup.  Right now, it just tells you that for the last year, the totally
obvious steps will lead to a failure.

   I can figure out all of the configuration issues for Hudson/Jenkins to
make it work, if somebody can push that into the Apache version.  However, I
think answering your questions first would be a good idea.  My totally
non-binding +1 for setting up a CI/Nightly build for the various stable
branches too, the only one I found on Apache was for trunk.


>
>> As it happens, yesterday was the 1 year anniversary of the last
>> successful Hudson/Jenkins build...  If that actually worked, we could
>> point people towards it as a useful recipe for how to get a build
>> working off trunk.  I haven't been following Nutch too closely, but it
>> always strikes me as really odd, that there's a nightly build and it
>> doesn't bother anybody that it fails all the time (and that there
>> isn't a nightly build for the stable branches).
>>
>
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
>
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...
>
> At the same time, there has been a new lease of life into Nutch as a whole
> : there is definitely more activity on the mailing lists, new users, new
> active committers  etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4
>
> So the question is : what shall we do with 2.0? Here are a few
> possibilities :
>
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> again
> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
>
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
>
> What do you guys think?
>

   I know nothing about the 2.0 branch, and can't really contribute to that
conversation (that job issue interferes will all my free time).

    Kirby


> Julien
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by lewis john mcgibbney <le...@gmail.com>.
Glad to see were making progress here.

Same with me, I am ready to move on with the project and move out of this
'rut' we have been in with trunk.

Thanks

On Sat, Sep 17, 2011 at 6:56 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hey Markus,
>
> No worries. I actually have no dog in this fight to be honest.
>
> I want Gora to be successful, and I want Nutch to be successful.
> I haven't contributed much to Nutch 2.0 trunk but I have been
> to the 1.x series branch. I wish I knew more about Gora's internals (and
> am trying to learn) so I could help more with it. I think it will make a
> lot
> of sense to use it at some point.
>
> At the same time, I'm all for making 1.x releases and naturally getting to
> 2.0 over time based on our current progress and understanding. I'm also
> super excited about the 1.x versions of Nutch and when I think about it
> the reality is that they've always been Nutch trunk even though we
> artificially tried to turn the nutchbase brancn into it.
>
> So to wrap it up, I'm totally fine with 1.x moving into trunk and with
> executing
> the plan I proposed a while back:
>
> ---snip
> 1. branch the current trunk as
> https://svn.apache.org/repos/asf/nutch/branches/nutchgora
> 2. grab latest stable branch (e.g.,
> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
> *replace* the Nutch trunk with it, and bump the version # to 1.7-dev
> 3. active development on stable becomes active development in trunk and
> nutchgora still
> exists in case anyone ever resurrects it.
> ---snip
>
> Of course, it's not 1.6 (I was optimistic about getting there in 6 months
> ;) ), but it's really 1.4.
> And we don't need to bump to -dev since we're already in full dev with the
> 1.4 cycle.
>
> So, I'm ready for a VOTE. Feel free to call one (or have Julien do it), and
> I'll VOTE +1.
>
> Cheers,
> Chris
>
>
> On Sep 17, 2011, at 10:18 AM, Markus Jelsma wrote:
>
> > Hi Chris,
> >
> > I initially respawned this thread with the suggestion to not to wait
> until
> > january orso before the vote. Hence my apologies for being impatient and
> > pessimistic about trunk :)
> >
> > Cheers,
> >
> >> Hey Julien,
> >>
> >> My option E was pretty much equivalent to B except I specified a time
> frame
> >> (next 6 months). Are we just saying that we'll accelerate the time frame
> >> to say, umm, next week or the week after? :)
> >>
> >> If so, fine by me. Since I moved nutchbase into the trunk at one point,
> I'd
> >> be happy once we've VOTEd and decided to be the one to execute moving it
> >> out.
> >>
> >> And yes, PMC votes will be binding and we'll do majority takes it, fine
> by
> >> me.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote:
> >>> Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if
> most
> >>> people are in favour then we don't need to look into other options at
> >>> all. If not, we'll see what alternatives or arguments come up and vote
> >>> on these later.
> >>>
> >>> I assume that only PMC votes will be binding and the majority takes it?
> >>>
> >>> Julien
> >>>
> >>> On 16 September 2011 22:30, Mattmann, Chris A (388J)
> >>> <ch...@jpl.nasa.gov> wrote: Why don't we just collect VOTEs
> >>> for each of the options a-e, and then figure out based on that if there
> >>> is a majority. If there's no majority, we can widdle it down to say the
> >>> top 2-3, and then VOTE on those, looking for majority again.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote:
> >>>> Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can
> >>>> always choose to hardwire HBASE (option D) later.
> >>>>
> >>>> Markus
> >>>>
> >>>>> Am happy to call for a vote on the future of Nutch 2.0 if you want.
> >>>>> Shall we reduce the various options described before to a single one?
> >>>>>
> >>>>> Julien
> >>>>>
> >>>>> On 15 September 2011 19:55, Markus Jelsma
> > <ma...@openindex.io>wrote:
> >>>>>>> Hi Guys,
> >>>>>>>
> >>>>>>> I thought I'd chime in on this thread. My comments below:
> >>>>>>>> I understand and share your frustration, however you need to bear
> >>>>>>>> in
> >>>>>>
> >>>>>> mind
> >>>>>>
> >>>>>>>> that things are done only if people volunteer and have time -
> >>>>>>>> usually taken from their holiday, weekends, evenings. Chris (who
> >>>>>>>> is the de
> >>>>>>
> >>>>>> facto
> >>>>>>
> >>>>>>>> release master for Nutch and Gora) has not had the time and nobody
> >>>>>>>> else has volunteered to do it.
> >>>>>>>
> >>>>>>> Yep I haven't had the time to push a Gora 0.1.1-incubating release
> >>>>>>> that will address the Maven issues. However it is on my roadmap for
> >>>>>>> open
> >>>>>>
> >>>>>> source
> >>>>>>
> >>>>>>> stuff to get done in the next month, so that's a good thing. But
> >>>>>>> yes,
> >>>>>>
> >>>>>> that
> >>>>>>
> >>>>>>> portion of my open source work is all volunteer time, so sometimes
> >>>>>>> other things take priority.
> >>>>>>>
> >>>>>>>>> As it happens, yesterday was the 1 year anniversary of the last
> >>>>>>>>> successful Hudson/Jenkins build...  If that actually worked, we
> >>>>>>>>> could point people towards it as a useful recipe for how to get a
> >>>>>>>>> build working off trunk.  I haven't been following Nutch too
> >>>>>>>>> closely, but it always strikes me as really odd, that there's a
> >>>>>>>>> nightly build and it doesn't bother anybody that it fails all the
> >>>>>>>>> time (and that there isn't a nightly build for the stable
> >>>>>>>>> branches).
> >>>>>>>>
> >>>>>>>> The real issue behind all this is what we should do with Nutch
> 2.0.
> >>>>>>
> >>>>>> What
> >>>>>>
> >>>>>>>> follows is only my opinion and I would love to hear what others
> >>>>>>>> have to say on this subject.
> >>>>>>>>
> >>>>>>>> Since we (actually mostly Dogacan) wrote 2.0 and delegated the
> >>>>>>>> storage
> >>>>>>
> >>>>>> to
> >>>>>>
> >>>>>>>> Gora, the latter hasn't really taken off since incubation. There
> >>>>>>>> have been some modest contributions to it but it does not seem to
> >>>>>>>> be used much and there is virtually nothing happening on it in
> >>>>>>>> terms of development. More worryingly, the people who initially
> >>>>>>>> contributed to
> >>>>>>
> >>>>>> it
> >>>>>>
> >>>>>>>> are not very active on the project (such is life, new jobs,
> >>>>>>>> different projects, etc...) anymore·. As for Nutch 2.0, it hasn't
> >>>>>>>> made any progress in  the last 12 months : we still have the same
> >>>>>>>> bugs, the
> >>>>>>
> >>>>>> tests
> >>>>>>
> >>>>>>>> do not work, the build has to be done manually etc...
> >>>>>>>
> >>>>>>> Yep.
> >>>>>>>
> >>>>>>>> At the same time, there has been a new lease of life into Nutch as
> >>>>>>>> a whole : there is definitely more activity on the mailing lists,
> >>>>>>>> new users, new active committers  etc... and quite a few bugfixes
> >>>>>>>> and improvements - most of them backported from what had been done
> >>>>>>>> in the trunk and people seem fairly happy with what we can do with
> >>>>>>>> 1.4
> >>>>>>>
> >>>>>>> Totally agreed. I'm actually not super surprised -- ever since 1.1,
> >>>>>>> I
> >>>>>>
> >>>>>> kind
> >>>>>>
> >>>>>>> of felt that maintaining a stable 1.X branch of Nutch (in parallel
> >>>>>>> to the 2.0 efforts) was really going to pay off since there was
> >>>>>>> renewed interest from users in leveraging (and furthermore
> >>>>>>> accepting) the nuances of 1.X.
> >>>>>>>
> >>>>>>>> So the question is : what shall we do with 2.0? Here are a few
> >>>>>>>> possibilities
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> a) put some effort into it, fix the bugs and make so that it can
> be
> >>>>>>
> >>>>>> used
> >>>>>>
> >>>>>>>> instead of 1.x
> >>>>>>>> b) shelve it and leave it for enthusiasts to play with + make 1.x
> >>>>>>>> the trunk again
> >>>>>>>> c) do nothing : keep 2.0 and 1.x in parallel  (but having to
> >>>>>>>> maintain
> >>>>>>
> >>>>>> two
> >>>>>>
> >>>>>>>> branches is quite a pain)
> >>>>>>>> d) abandon the idea of a neutral storage layer with Gora and
> >>>>>>>> hardwire
> >>>>>>
> >>>>>> it
> >>>>>>
> >>>>>>>> to e.g. HBase
> >>>>>>>>
> >>>>>>>> Option (a) has not happened in the last 12 months and I am not
> very
> >>>>>>>> hopeful about it.
> >>>>>>>>
> >>>>>>>> What do you guys think?
> >>>>>>>
> >>>>>>> I'd suggest an option e). Evolve and keep releasing 1.X over the
> >>>>>>> next 6 months, and keep 2.0 in the trunk. After 6 months, see how
> >>>>>>> close 1.X is
> >>>>>>
> >>>>>> to
> >>>>>>
> >>>>>>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If
> we
> >>>>>>> get to ~1.6 over the next 6 months and there is still no active
> >>>>>>> development
> >>>>>>
> >>>>>> on
> >>>>>>
> >>>>>>> 2.0, I'd propose we do this at that point in time:
> >>>>>>>
> >>>>>>> 1. branch the current trunk as
> >>>>>>> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
> >>>>>>> latest stable branch (e.g.,
> >>>>>>> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
> >>>>>>
> >>>>>> *replace*
> >>>>>>
> >>>>>>> the Nutch trunk with it, and bump the version # to 1.7-dev 3.
> active
> >>>>>>> development on stable becomes active development in trunk and
> >>>>>>> nutchgora still exists in case anyone ever resurrects it.
> >>>>>>>
> >>>>>>> That way, we give another 6 months to see how it shakes out and
> >>>>>>
> >>>>>> potentially
> >>>>>>
> >>>>>>> allow for 1 or 2 or 3 more stable releases before switching those
> >>>>>>> over to trunk.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>
> >>>>>> Yes. I don't believe we should wait until january before discussing
> >>>>>> this topic
> >>>>>> again. I, for example, cannot spend considerable extra time on the
> >>>>>> issues i put in 1.4, also due to the fact that it's not entirely
> >>>>>> stable.
> >>>>>>
> >>>>>> There are many things i can write about this topic right now but
> >>>>>> don't feel it's neccessary. The choice is difficult and perhaps
> >>>>>> painful but when the voting round is opened by our project lead, i
> >>>>>> will vote for promoting 1.x back
> >>>>>> to trunk.
> >>>>>>
> >>>>>> My apologies for my impatience and pessimism.
> >>>>>>
> >>>>>>> BTW, I have a couple contributions from my CS572: Search Engines
> >>>>>>> class
> >>>>>>
> >>>>>> from
> >>>>>>
> >>>>>>> a year ago that I'd love to port into the Nutch stable branch
> >>>>>>> including Hubs/Authorities ranking and some other goodies. I'll try
> >>>>>>> and work on those over the next few months, I'm just letting
> >>>>>>> everyone know now so I don't forget again :-)
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Chris
> >>>>>>>
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>> Chris Mattmann, Ph.D.
> >>>>>>> Senior Computer Scientist
> >>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>>>>> Office: 171-266B, Mailstop: 171-246
> >>>>>>> Email: chris.a.mattmann@nasa.gov
> >>>>>>> WWW:   http://sunset.usc.edu/~mattmann/
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>> Adjunct Assistant Professor, Computer Science Department
> >>>>>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: chris.a.mattmann@nasa.gov
> >>> WWW:   http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: chris.a.mattmann@nasa.gov
> >> WWW:   http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
*Lewis*

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Markus,

No worries. I actually have no dog in this fight to be honest. 

I want Gora to be successful, and I want Nutch to be successful. 
I haven't contributed much to Nutch 2.0 trunk but I have been 
to the 1.x series branch. I wish I knew more about Gora's internals (and 
am trying to learn) so I could help more with it. I think it will make a lot 
of sense to use it at some point.

At the same time, I'm all for making 1.x releases and naturally getting to 
2.0 over time based on our current progress and understanding. I'm also 
super excited about the 1.x versions of Nutch and when I think about it
the reality is that they've always been Nutch trunk even though we 
artificially tried to turn the nutchbase brancn into it. 

So to wrap it up, I'm totally fine with 1.x moving into trunk and with executing 
the plan I proposed a while back:

---snip
1. branch the current trunk as https://svn.apache.org/repos/asf/nutch/branches/nutchgora
2. grab latest stable branch (e.g., https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and 
*replace* the Nutch trunk with it, and bump the version # to 1.7-dev
3. active development on stable becomes active development in trunk and nutchgora still 
exists in case anyone ever resurrects it.
---snip

Of course, it's not 1.6 (I was optimistic about getting there in 6 months ;) ), but it's really 1.4. 
And we don't need to bump to -dev since we're already in full dev with the 1.4 cycle. 

So, I'm ready for a VOTE. Feel free to call one (or have Julien do it), and I'll VOTE +1.

Cheers,
Chris


On Sep 17, 2011, at 10:18 AM, Markus Jelsma wrote:

> Hi Chris,
> 
> I initially respawned this thread with the suggestion to not to wait until
> january orso before the vote. Hence my apologies for being impatient and
> pessimistic about trunk :)
> 
> Cheers,
> 
>> Hey Julien,
>> 
>> My option E was pretty much equivalent to B except I specified a time frame
>> (next 6 months). Are we just saying that we'll accelerate the time frame
>> to say, umm, next week or the week after? :)
>> 
>> If so, fine by me. Since I moved nutchbase into the trunk at one point, I'd
>> be happy once we've VOTEd and decided to be the one to execute moving it
>> out.
>> 
>> And yes, PMC votes will be binding and we'll do majority takes it, fine by
>> me.
>> 
>> Cheers,
>> Chris
>> 
>> On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote:
>>> Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most
>>> people are in favour then we don't need to look into other options at
>>> all. If not, we'll see what alternatives or arguments come up and vote
>>> on these later.
>>> 
>>> I assume that only PMC votes will be binding and the majority takes it?
>>> 
>>> Julien
>>> 
>>> On 16 September 2011 22:30, Mattmann, Chris A (388J)
>>> <ch...@jpl.nasa.gov> wrote: Why don't we just collect VOTEs
>>> for each of the options a-e, and then figure out based on that if there
>>> is a majority. If there's no majority, we can widdle it down to say the
>>> top 2-3, and then VOTE on those, looking for majority again.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote:
>>>> Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can
>>>> always choose to hardwire HBASE (option D) later.
>>>> 
>>>> Markus
>>>> 
>>>>> Am happy to call for a vote on the future of Nutch 2.0 if you want.
>>>>> Shall we reduce the various options described before to a single one?
>>>>> 
>>>>> Julien
>>>>> 
>>>>> On 15 September 2011 19:55, Markus Jelsma
> <ma...@openindex.io>wrote:
>>>>>>> Hi Guys,
>>>>>>> 
>>>>>>> I thought I'd chime in on this thread. My comments below:
>>>>>>>> I understand and share your frustration, however you need to bear
>>>>>>>> in
>>>>>> 
>>>>>> mind
>>>>>> 
>>>>>>>> that things are done only if people volunteer and have time -
>>>>>>>> usually taken from their holiday, weekends, evenings. Chris (who
>>>>>>>> is the de
>>>>>> 
>>>>>> facto
>>>>>> 
>>>>>>>> release master for Nutch and Gora) has not had the time and nobody
>>>>>>>> else has volunteered to do it.
>>>>>>> 
>>>>>>> Yep I haven't had the time to push a Gora 0.1.1-incubating release
>>>>>>> that will address the Maven issues. However it is on my roadmap for
>>>>>>> open
>>>>>> 
>>>>>> source
>>>>>> 
>>>>>>> stuff to get done in the next month, so that's a good thing. But
>>>>>>> yes,
>>>>>> 
>>>>>> that
>>>>>> 
>>>>>>> portion of my open source work is all volunteer time, so sometimes
>>>>>>> other things take priority.
>>>>>>> 
>>>>>>>>> As it happens, yesterday was the 1 year anniversary of the last
>>>>>>>>> successful Hudson/Jenkins build...  If that actually worked, we
>>>>>>>>> could point people towards it as a useful recipe for how to get a
>>>>>>>>> build working off trunk.  I haven't been following Nutch too
>>>>>>>>> closely, but it always strikes me as really odd, that there's a
>>>>>>>>> nightly build and it doesn't bother anybody that it fails all the
>>>>>>>>> time (and that there isn't a nightly build for the stable
>>>>>>>>> branches).
>>>>>>>> 
>>>>>>>> The real issue behind all this is what we should do with Nutch 2.0.
>>>>>> 
>>>>>> What
>>>>>> 
>>>>>>>> follows is only my opinion and I would love to hear what others
>>>>>>>> have to say on this subject.
>>>>>>>> 
>>>>>>>> Since we (actually mostly Dogacan) wrote 2.0 and delegated the
>>>>>>>> storage
>>>>>> 
>>>>>> to
>>>>>> 
>>>>>>>> Gora, the latter hasn't really taken off since incubation. There
>>>>>>>> have been some modest contributions to it but it does not seem to
>>>>>>>> be used much and there is virtually nothing happening on it in
>>>>>>>> terms of development. More worryingly, the people who initially
>>>>>>>> contributed to
>>>>>> 
>>>>>> it
>>>>>> 
>>>>>>>> are not very active on the project (such is life, new jobs,
>>>>>>>> different projects, etc...) anymore·. As for Nutch 2.0, it hasn't
>>>>>>>> made any progress in  the last 12 months : we still have the same
>>>>>>>> bugs, the
>>>>>> 
>>>>>> tests
>>>>>> 
>>>>>>>> do not work, the build has to be done manually etc...
>>>>>>> 
>>>>>>> Yep.
>>>>>>> 
>>>>>>>> At the same time, there has been a new lease of life into Nutch as
>>>>>>>> a whole : there is definitely more activity on the mailing lists,
>>>>>>>> new users, new active committers  etc... and quite a few bugfixes
>>>>>>>> and improvements - most of them backported from what had been done
>>>>>>>> in the trunk and people seem fairly happy with what we can do with
>>>>>>>> 1.4
>>>>>>> 
>>>>>>> Totally agreed. I'm actually not super surprised -- ever since 1.1,
>>>>>>> I
>>>>>> 
>>>>>> kind
>>>>>> 
>>>>>>> of felt that maintaining a stable 1.X branch of Nutch (in parallel
>>>>>>> to the 2.0 efforts) was really going to pay off since there was
>>>>>>> renewed interest from users in leveraging (and furthermore
>>>>>>> accepting) the nuances of 1.X.
>>>>>>> 
>>>>>>>> So the question is : what shall we do with 2.0? Here are a few
>>>>>>>> possibilities
>>>>>>>> 
>>>>>>>> 
>>>>>>>> a) put some effort into it, fix the bugs and make so that it can be
>>>>>> 
>>>>>> used
>>>>>> 
>>>>>>>> instead of 1.x
>>>>>>>> b) shelve it and leave it for enthusiasts to play with + make 1.x
>>>>>>>> the trunk again
>>>>>>>> c) do nothing : keep 2.0 and 1.x in parallel  (but having to
>>>>>>>> maintain
>>>>>> 
>>>>>> two
>>>>>> 
>>>>>>>> branches is quite a pain)
>>>>>>>> d) abandon the idea of a neutral storage layer with Gora and
>>>>>>>> hardwire
>>>>>> 
>>>>>> it
>>>>>> 
>>>>>>>> to e.g. HBase
>>>>>>>> 
>>>>>>>> Option (a) has not happened in the last 12 months and I am not very
>>>>>>>> hopeful about it.
>>>>>>>> 
>>>>>>>> What do you guys think?
>>>>>>> 
>>>>>>> I'd suggest an option e). Evolve and keep releasing 1.X over the
>>>>>>> next 6 months, and keep 2.0 in the trunk. After 6 months, see how
>>>>>>> close 1.X is
>>>>>> 
>>>>>> to
>>>>>> 
>>>>>>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we
>>>>>>> get to ~1.6 over the next 6 months and there is still no active
>>>>>>> development
>>>>>> 
>>>>>> on
>>>>>> 
>>>>>>> 2.0, I'd propose we do this at that point in time:
>>>>>>> 
>>>>>>> 1. branch the current trunk as
>>>>>>> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
>>>>>>> latest stable branch (e.g.,
>>>>>>> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
>>>>>> 
>>>>>> *replace*
>>>>>> 
>>>>>>> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
>>>>>>> development on stable becomes active development in trunk and
>>>>>>> nutchgora still exists in case anyone ever resurrects it.
>>>>>>> 
>>>>>>> That way, we give another 6 months to see how it shakes out and
>>>>>> 
>>>>>> potentially
>>>>>> 
>>>>>>> allow for 1 or 2 or 3 more stable releases before switching those
>>>>>>> over to trunk.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>> 
>>>>>> Yes. I don't believe we should wait until january before discussing
>>>>>> this topic
>>>>>> again. I, for example, cannot spend considerable extra time on the
>>>>>> issues i put in 1.4, also due to the fact that it's not entirely
>>>>>> stable.
>>>>>> 
>>>>>> There are many things i can write about this topic right now but
>>>>>> don't feel it's neccessary. The choice is difficult and perhaps
>>>>>> painful but when the voting round is opened by our project lead, i
>>>>>> will vote for promoting 1.x back
>>>>>> to trunk.
>>>>>> 
>>>>>> My apologies for my impatience and pessimism.
>>>>>> 
>>>>>>> BTW, I have a couple contributions from my CS572: Search Engines
>>>>>>> class
>>>>>> 
>>>>>> from
>>>>>> 
>>>>>>> a year ago that I'd love to port into the Nutch stable branch
>>>>>>> including Hubs/Authorities ranking and some other goodies. I'll try
>>>>>>> and work on those over the next few months, I'm just letting
>>>>>>> everyone know now so I don't forget again :-)
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>> 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Markus Jelsma <ma...@openindex.io>.
Hi Chris,

I initially respawned this thread with the suggestion to not to wait until 
january orso before the vote. Hence my apologies for being impatient and 
pessimistic about trunk :)

Cheers,

> Hey Julien,
> 
> My option E was pretty much equivalent to B except I specified a time frame
> (next 6 months). Are we just saying that we'll accelerate the time frame
> to say, umm, next week or the week after? :)
> 
> If so, fine by me. Since I moved nutchbase into the trunk at one point, I'd
> be happy once we've VOTEd and decided to be the one to execute moving it
> out.
> 
> And yes, PMC votes will be binding and we'll do majority takes it, fine by
> me.
> 
> Cheers,
> Chris
> 
> On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote:
> > Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most
> > people are in favour then we don't need to look into other options at
> > all. If not, we'll see what alternatives or arguments come up and vote
> > on these later.
> > 
> > I assume that only PMC votes will be binding and the majority takes it?
> > 
> > Julien
> > 
> > On 16 September 2011 22:30, Mattmann, Chris A (388J)
> > <ch...@jpl.nasa.gov> wrote: Why don't we just collect VOTEs
> > for each of the options a-e, and then figure out based on that if there
> > is a majority. If there's no majority, we can widdle it down to say the
> > top 2-3, and then VOTE on those, looking for majority again.
> > 
> > Cheers,
> > Chris
> > 
> > On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote:
> > > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can
> > > always choose to hardwire HBASE (option D) later.
> > > 
> > > Markus
> > > 
> > >> Am happy to call for a vote on the future of Nutch 2.0 if you want.
> > >> Shall we reduce the various options described before to a single one?
> > >> 
> > >> Julien
> > >> 
> > >> On 15 September 2011 19:55, Markus Jelsma 
<ma...@openindex.io>wrote:
> > >>>> Hi Guys,
> > >>>> 
> > >>>> I thought I'd chime in on this thread. My comments below:
> > >>>>> I understand and share your frustration, however you need to bear
> > >>>>> in
> > >>> 
> > >>> mind
> > >>> 
> > >>>>> that things are done only if people volunteer and have time -
> > >>>>> usually taken from their holiday, weekends, evenings. Chris (who
> > >>>>> is the de
> > >>> 
> > >>> facto
> > >>> 
> > >>>>> release master for Nutch and Gora) has not had the time and nobody
> > >>>>> else has volunteered to do it.
> > >>>> 
> > >>>> Yep I haven't had the time to push a Gora 0.1.1-incubating release
> > >>>> that will address the Maven issues. However it is on my roadmap for
> > >>>> open
> > >>> 
> > >>> source
> > >>> 
> > >>>> stuff to get done in the next month, so that's a good thing. But
> > >>>> yes,
> > >>> 
> > >>> that
> > >>> 
> > >>>> portion of my open source work is all volunteer time, so sometimes
> > >>>> other things take priority.
> > >>>> 
> > >>>>>> As it happens, yesterday was the 1 year anniversary of the last
> > >>>>>> successful Hudson/Jenkins build...  If that actually worked, we
> > >>>>>> could point people towards it as a useful recipe for how to get a
> > >>>>>> build working off trunk.  I haven't been following Nutch too
> > >>>>>> closely, but it always strikes me as really odd, that there's a
> > >>>>>> nightly build and it doesn't bother anybody that it fails all the
> > >>>>>> time (and that there isn't a nightly build for the stable
> > >>>>>> branches).
> > >>>>> 
> > >>>>> The real issue behind all this is what we should do with Nutch 2.0.
> > >>> 
> > >>> What
> > >>> 
> > >>>>> follows is only my opinion and I would love to hear what others
> > >>>>> have to say on this subject.
> > >>>>> 
> > >>>>> Since we (actually mostly Dogacan) wrote 2.0 and delegated the
> > >>>>> storage
> > >>> 
> > >>> to
> > >>> 
> > >>>>> Gora, the latter hasn't really taken off since incubation. There
> > >>>>> have been some modest contributions to it but it does not seem to
> > >>>>> be used much and there is virtually nothing happening on it in
> > >>>>> terms of development. More worryingly, the people who initially
> > >>>>> contributed to
> > >>> 
> > >>> it
> > >>> 
> > >>>>> are not very active on the project (such is life, new jobs,
> > >>>>> different projects, etc...) anymore·. As for Nutch 2.0, it hasn't
> > >>>>> made any progress in  the last 12 months : we still have the same
> > >>>>> bugs, the
> > >>> 
> > >>> tests
> > >>> 
> > >>>>> do not work, the build has to be done manually etc...
> > >>>> 
> > >>>> Yep.
> > >>>> 
> > >>>>> At the same time, there has been a new lease of life into Nutch as
> > >>>>> a whole : there is definitely more activity on the mailing lists,
> > >>>>> new users, new active committers  etc... and quite a few bugfixes
> > >>>>> and improvements - most of them backported from what had been done
> > >>>>> in the trunk and people seem fairly happy with what we can do with
> > >>>>> 1.4
> > >>>> 
> > >>>> Totally agreed. I'm actually not super surprised -- ever since 1.1,
> > >>>> I
> > >>> 
> > >>> kind
> > >>> 
> > >>>> of felt that maintaining a stable 1.X branch of Nutch (in parallel
> > >>>> to the 2.0 efforts) was really going to pay off since there was
> > >>>> renewed interest from users in leveraging (and furthermore
> > >>>> accepting) the nuances of 1.X.
> > >>>> 
> > >>>>> So the question is : what shall we do with 2.0? Here are a few
> > >>>>> possibilities
> > >>>>> 
> > >>>>> 
> > >>>>> a) put some effort into it, fix the bugs and make so that it can be
> > >>> 
> > >>> used
> > >>> 
> > >>>>> instead of 1.x
> > >>>>> b) shelve it and leave it for enthusiasts to play with + make 1.x
> > >>>>> the trunk again
> > >>>>> c) do nothing : keep 2.0 and 1.x in parallel  (but having to
> > >>>>> maintain
> > >>> 
> > >>> two
> > >>> 
> > >>>>> branches is quite a pain)
> > >>>>> d) abandon the idea of a neutral storage layer with Gora and
> > >>>>> hardwire
> > >>> 
> > >>> it
> > >>> 
> > >>>>> to e.g. HBase
> > >>>>> 
> > >>>>> Option (a) has not happened in the last 12 months and I am not very
> > >>>>> hopeful about it.
> > >>>>> 
> > >>>>> What do you guys think?
> > >>>> 
> > >>>> I'd suggest an option e). Evolve and keep releasing 1.X over the
> > >>>> next 6 months, and keep 2.0 in the trunk. After 6 months, see how
> > >>>> close 1.X is
> > >>> 
> > >>> to
> > >>> 
> > >>>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we
> > >>>> get to ~1.6 over the next 6 months and there is still no active
> > >>>> development
> > >>> 
> > >>> on
> > >>> 
> > >>>> 2.0, I'd propose we do this at that point in time:
> > >>>> 
> > >>>> 1. branch the current trunk as
> > >>>> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
> > >>>> latest stable branch (e.g.,
> > >>>> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
> > >>> 
> > >>> *replace*
> > >>> 
> > >>>> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> > >>>> development on stable becomes active development in trunk and
> > >>>> nutchgora still exists in case anyone ever resurrects it.
> > >>>> 
> > >>>> That way, we give another 6 months to see how it shakes out and
> > >>> 
> > >>> potentially
> > >>> 
> > >>>> allow for 1 or 2 or 3 more stable releases before switching those
> > >>>> over to trunk.
> > >>>> 
> > >>>> Thoughts?
> > >>> 
> > >>> Yes. I don't believe we should wait until january before discussing
> > >>> this topic
> > >>> again. I, for example, cannot spend considerable extra time on the
> > >>> issues i put in 1.4, also due to the fact that it's not entirely
> > >>> stable.
> > >>> 
> > >>> There are many things i can write about this topic right now but
> > >>> don't feel it's neccessary. The choice is difficult and perhaps
> > >>> painful but when the voting round is opened by our project lead, i
> > >>> will vote for promoting 1.x back
> > >>> to trunk.
> > >>> 
> > >>> My apologies for my impatience and pessimism.
> > >>> 
> > >>>> BTW, I have a couple contributions from my CS572: Search Engines
> > >>>> class
> > >>> 
> > >>> from
> > >>> 
> > >>>> a year ago that I'd love to port into the Nutch stable branch
> > >>>> including Hubs/Authorities ranking and some other goodies. I'll try
> > >>>> and work on those over the next few months, I'm just letting
> > >>>> everyone know now so I don't forget again :-)
> > >>>> 
> > >>>> Cheers,
> > >>>> Chris
> > >>>> 
> > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>> Chris Mattmann, Ph.D.
> > >>>> Senior Computer Scientist
> > >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >>>> Office: 171-266B, Mailstop: 171-246
> > >>>> Email: chris.a.mattmann@nasa.gov
> > >>>> WWW:   http://sunset.usc.edu/~mattmann/
> > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>> Adjunct Assistant Professor, Computer Science Department
> > >>>> University of Southern California, Los Angeles, CA 90089 USA
> > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattmann@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Julien,

My option E was pretty much equivalent to B except I specified a time frame (next 6 months). Are we just 
saying that we'll accelerate the time frame to say, umm, next week or the week after? :)

If so, fine by me. Since I moved nutchbase into the trunk at one point, I'd be happy once we've VOTEd and 
decided to be the one to execute moving it out.

And yes, PMC votes will be binding and we'll do majority takes it, fine by me.

Cheers,
Chris

On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote:

> Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most people are in favour then we don't need to look into other options at all. If not, we'll see what alternatives or arguments come up and vote on these later.
> 
> I assume that only PMC votes will be binding and the majority takes it?
> 
> Julien
> 
> On 16 September 2011 22:30, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
> Why don't we just collect VOTEs for each of the options a-e, and then
> figure out based on that if there is a majority. If there's no majority, we
> can widdle it down to say the top 2-3, and then VOTE on those, looking
> for majority again.
> 
> Cheers,
> Chris
> 
> On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote:
> 
> > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always
> > choose to hardwire HBASE (option D) later.
> >
> > Markus
> >
> >> Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall
> >> we reduce the various options described before to a single one?
> >>
> >> Julien
> >>
> >> On 15 September 2011 19:55, Markus Jelsma <ma...@openindex.io>wrote:
> >>>> Hi Guys,
> >>>>
> >>>> I thought I'd chime in on this thread. My comments below:
> >>>>> I understand and share your frustration, however you need to bear in
> >>>
> >>> mind
> >>>
> >>>>> that things are done only if people volunteer and have time - usually
> >>>>> taken from their holiday, weekends, evenings. Chris (who is the de
> >>>
> >>> facto
> >>>
> >>>>> release master for Nutch and Gora) has not had the time and nobody
> >>>>> else has volunteered to do it.
> >>>>
> >>>> Yep I haven't had the time to push a Gora 0.1.1-incubating release that
> >>>> will address the Maven issues. However it is on my roadmap for open
> >>>
> >>> source
> >>>
> >>>> stuff to get done in the next month, so that's a good thing. But yes,
> >>>
> >>> that
> >>>
> >>>> portion of my open source work is all volunteer time, so sometimes
> >>>> other things take priority.
> >>>>
> >>>>>> As it happens, yesterday was the 1 year anniversary of the last
> >>>>>> successful Hudson/Jenkins build...  If that actually worked, we
> >>>>>> could point people towards it as a useful recipe for how to get a
> >>>>>> build working off trunk.  I haven't been following Nutch too
> >>>>>> closely, but it always strikes me as really odd, that there's a
> >>>>>> nightly build and it doesn't bother anybody that it fails all the
> >>>>>> time (and that there isn't a nightly build for the stable
> >>>>>> branches).
> >>>>>
> >>>>> The real issue behind all this is what we should do with Nutch 2.0.
> >>>
> >>> What
> >>>
> >>>>> follows is only my opinion and I would love to hear what others have
> >>>>> to say on this subject.
> >>>>>
> >>>>> Since we (actually mostly Dogacan) wrote 2.0 and delegated the
> >>>>> storage
> >>>
> >>> to
> >>>
> >>>>> Gora, the latter hasn't really taken off since incubation. There have
> >>>>> been some modest contributions to it but it does not seem to be used
> >>>>> much and there is virtually nothing happening on it in terms of
> >>>>> development. More worryingly, the people who initially contributed to
> >>>
> >>> it
> >>>
> >>>>> are not very active on the project (such is life, new jobs, different
> >>>>> projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
> >>>>> progress in  the last 12 months : we still have the same bugs, the
> >>>
> >>> tests
> >>>
> >>>>> do not work, the build has to be done manually etc...
> >>>>
> >>>> Yep.
> >>>>
> >>>>> At the same time, there has been a new lease of life into Nutch as a
> >>>>> whole : there is definitely more activity on the mailing lists, new
> >>>>> users, new active committers  etc... and quite a few bugfixes and
> >>>>> improvements - most of them backported from what had been done in the
> >>>>> trunk and people seem fairly happy with what we can do with 1.4
> >>>>
> >>>> Totally agreed. I'm actually not super surprised -- ever since 1.1, I
> >>>
> >>> kind
> >>>
> >>>> of felt that maintaining a stable 1.X branch of Nutch (in parallel to
> >>>> the 2.0 efforts) was really going to pay off since there was renewed
> >>>> interest from users in leveraging (and furthermore accepting) the
> >>>> nuances of 1.X.
> >>>>
> >>>>> So the question is : what shall we do with 2.0? Here are a few
> >>>>> possibilities
> >>>>>
> >>>>>
> >>>>> a) put some effort into it, fix the bugs and make so that it can be
> >>>
> >>> used
> >>>
> >>>>> instead of 1.x
> >>>>> b) shelve it and leave it for enthusiasts to play with + make 1.x the
> >>>>> trunk again
> >>>>> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain
> >>>
> >>> two
> >>>
> >>>>> branches is quite a pain)
> >>>>> d) abandon the idea of a neutral storage layer with Gora and hardwire
> >>>
> >>> it
> >>>
> >>>>> to e.g. HBase
> >>>>>
> >>>>> Option (a) has not happened in the last 12 months and I am not very
> >>>>> hopeful about it.
> >>>>>
> >>>>> What do you guys think?
> >>>>
> >>>> I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
> >>>> months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is
> >>>
> >>> to
> >>>
> >>>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we
> >>>> get to ~1.6 over the next 6 months and there is still no active
> >>>> development
> >>>
> >>> on
> >>>
> >>>> 2.0, I'd propose we do this at that point in time:
> >>>>
> >>>> 1. branch the current trunk as
> >>>> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
> >>>> latest stable branch (e.g.,
> >>>> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
> >>>
> >>> *replace*
> >>>
> >>>> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> >>>> development on stable becomes active development in trunk and nutchgora
> >>>> still exists in case anyone ever resurrects it.
> >>>>
> >>>> That way, we give another 6 months to see how it shakes out and
> >>>
> >>> potentially
> >>>
> >>>> allow for 1 or 2 or 3 more stable releases before switching those over
> >>>> to trunk.
> >>>>
> >>>> Thoughts?
> >>>
> >>> Yes. I don't believe we should wait until january before discussing this
> >>> topic
> >>> again. I, for example, cannot spend considerable extra time on the issues
> >>> i put in 1.4, also due to the fact that it's not entirely stable.
> >>>
> >>> There are many things i can write about this topic right now but don't
> >>> feel it's neccessary. The choice is difficult and perhaps painful but
> >>> when the voting round is opened by our project lead, i will vote for
> >>> promoting 1.x back
> >>> to trunk.
> >>>
> >>> My apologies for my impatience and pessimism.
> >>>
> >>>> BTW, I have a couple contributions from my CS572: Search Engines class
> >>>
> >>> from
> >>>
> >>>> a year ago that I'd love to port into the Nutch stable branch including
> >>>> Hubs/Authorities ranking and some other goodies. I'll try and work on
> >>>> those over the next few months, I'm just letting everyone know now so I
> >>>> don't forget again :-)
> >>>>
> >>>> Cheers,
> >>>> Chris
> >>>>
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Chris Mattmann, Ph.D.
> >>>> Senior Computer Scientist
> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>> Office: 171-266B, Mailstop: 171-246
> >>>> Email: chris.a.mattmann@nasa.gov
> >>>> WWW:   http://sunset.usc.edu/~mattmann/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Adjunct Assistant Professor, Computer Science Department
> >>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Julien Nioche <li...@gmail.com>.
Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most
people are in favour then we don't need to look into other options at all.
If not, we'll see what alternatives or arguments come up and vote on these
later.

I assume that only PMC votes will be binding and the majority takes it?

Julien

On 16 September 2011 22:30, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Why don't we just collect VOTEs for each of the options a-e, and then
> figure out based on that if there is a majority. If there's no majority, we
> can widdle it down to say the top 2-3, and then VOTE on those, looking
> for majority again.
>
> Cheers,
> Chris
>
> On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote:
>
> > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can
> always
> > choose to hardwire HBASE (option D) later.
> >
> > Markus
> >
> >> Am happy to call for a vote on the future of Nutch 2.0 if you want.
> Shall
> >> we reduce the various options described before to a single one?
> >>
> >> Julien
> >>
> >> On 15 September 2011 19:55, Markus Jelsma <markus.jelsma@openindex.io
> >wrote:
> >>>> Hi Guys,
> >>>>
> >>>> I thought I'd chime in on this thread. My comments below:
> >>>>> I understand and share your frustration, however you need to bear in
> >>>
> >>> mind
> >>>
> >>>>> that things are done only if people volunteer and have time - usually
> >>>>> taken from their holiday, weekends, evenings. Chris (who is the de
> >>>
> >>> facto
> >>>
> >>>>> release master for Nutch and Gora) has not had the time and nobody
> >>>>> else has volunteered to do it.
> >>>>
> >>>> Yep I haven't had the time to push a Gora 0.1.1-incubating release
> that
> >>>> will address the Maven issues. However it is on my roadmap for open
> >>>
> >>> source
> >>>
> >>>> stuff to get done in the next month, so that's a good thing. But yes,
> >>>
> >>> that
> >>>
> >>>> portion of my open source work is all volunteer time, so sometimes
> >>>> other things take priority.
> >>>>
> >>>>>> As it happens, yesterday was the 1 year anniversary of the last
> >>>>>> successful Hudson/Jenkins build...  If that actually worked, we
> >>>>>> could point people towards it as a useful recipe for how to get a
> >>>>>> build working off trunk.  I haven't been following Nutch too
> >>>>>> closely, but it always strikes me as really odd, that there's a
> >>>>>> nightly build and it doesn't bother anybody that it fails all the
> >>>>>> time (and that there isn't a nightly build for the stable
> >>>>>> branches).
> >>>>>
> >>>>> The real issue behind all this is what we should do with Nutch 2.0.
> >>>
> >>> What
> >>>
> >>>>> follows is only my opinion and I would love to hear what others have
> >>>>> to say on this subject.
> >>>>>
> >>>>> Since we (actually mostly Dogacan) wrote 2.0 and delegated the
> >>>>> storage
> >>>
> >>> to
> >>>
> >>>>> Gora, the latter hasn't really taken off since incubation. There have
> >>>>> been some modest contributions to it but it does not seem to be used
> >>>>> much and there is virtually nothing happening on it in terms of
> >>>>> development. More worryingly, the people who initially contributed to
> >>>
> >>> it
> >>>
> >>>>> are not very active on the project (such is life, new jobs, different
> >>>>> projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
> >>>>> progress in  the last 12 months : we still have the same bugs, the
> >>>
> >>> tests
> >>>
> >>>>> do not work, the build has to be done manually etc...
> >>>>
> >>>> Yep.
> >>>>
> >>>>> At the same time, there has been a new lease of life into Nutch as a
> >>>>> whole : there is definitely more activity on the mailing lists, new
> >>>>> users, new active committers  etc... and quite a few bugfixes and
> >>>>> improvements - most of them backported from what had been done in the
> >>>>> trunk and people seem fairly happy with what we can do with 1.4
> >>>>
> >>>> Totally agreed. I'm actually not super surprised -- ever since 1.1, I
> >>>
> >>> kind
> >>>
> >>>> of felt that maintaining a stable 1.X branch of Nutch (in parallel to
> >>>> the 2.0 efforts) was really going to pay off since there was renewed
> >>>> interest from users in leveraging (and furthermore accepting) the
> >>>> nuances of 1.X.
> >>>>
> >>>>> So the question is : what shall we do with 2.0? Here are a few
> >>>>> possibilities
> >>>>>
> >>>>>
> >>>>> a) put some effort into it, fix the bugs and make so that it can be
> >>>
> >>> used
> >>>
> >>>>> instead of 1.x
> >>>>> b) shelve it and leave it for enthusiasts to play with + make 1.x the
> >>>>> trunk again
> >>>>> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain
> >>>
> >>> two
> >>>
> >>>>> branches is quite a pain)
> >>>>> d) abandon the idea of a neutral storage layer with Gora and hardwire
> >>>
> >>> it
> >>>
> >>>>> to e.g. HBase
> >>>>>
> >>>>> Option (a) has not happened in the last 12 months and I am not very
> >>>>> hopeful about it.
> >>>>>
> >>>>> What do you guys think?
> >>>>
> >>>> I'd suggest an option e). Evolve and keep releasing 1.X over the next
> 6
> >>>> months, and keep 2.0 in the trunk. After 6 months, see how close 1.X
> is
> >>>
> >>> to
> >>>
> >>>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we
> >>>> get to ~1.6 over the next 6 months and there is still no active
> >>>> development
> >>>
> >>> on
> >>>
> >>>> 2.0, I'd propose we do this at that point in time:
> >>>>
> >>>> 1. branch the current trunk as
> >>>> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
> >>>> latest stable branch (e.g.,
> >>>> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
> >>>
> >>> *replace*
> >>>
> >>>> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> >>>> development on stable becomes active development in trunk and
> nutchgora
> >>>> still exists in case anyone ever resurrects it.
> >>>>
> >>>> That way, we give another 6 months to see how it shakes out and
> >>>
> >>> potentially
> >>>
> >>>> allow for 1 or 2 or 3 more stable releases before switching those over
> >>>> to trunk.
> >>>>
> >>>> Thoughts?
> >>>
> >>> Yes. I don't believe we should wait until january before discussing
> this
> >>> topic
> >>> again. I, for example, cannot spend considerable extra time on the
> issues
> >>> i put in 1.4, also due to the fact that it's not entirely stable.
> >>>
> >>> There are many things i can write about this topic right now but don't
> >>> feel it's neccessary. The choice is difficult and perhaps painful but
> >>> when the voting round is opened by our project lead, i will vote for
> >>> promoting 1.x back
> >>> to trunk.
> >>>
> >>> My apologies for my impatience and pessimism.
> >>>
> >>>> BTW, I have a couple contributions from my CS572: Search Engines class
> >>>
> >>> from
> >>>
> >>>> a year ago that I'd love to port into the Nutch stable branch
> including
> >>>> Hubs/Authorities ranking and some other goodies. I'll try and work on
> >>>> those over the next few months, I'm just letting everyone know now so
> I
> >>>> don't forget again :-)
> >>>>
> >>>> Cheers,
> >>>> Chris
> >>>>
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Chris Mattmann, Ph.D.
> >>>> Senior Computer Scientist
> >>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>>> Office: 171-266B, Mailstop: 171-246
> >>>> Email: chris.a.mattmann@nasa.gov
> >>>> WWW:   http://sunset.usc.edu/~mattmann/
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> Adjunct Assistant Professor, Computer Science Department
> >>>> University of Southern California, Los Angeles, CA 90089 USA
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Why don't we just collect VOTEs for each of the options a-e, and then 
figure out based on that if there is a majority. If there's no majority, we 
can widdle it down to say the top 2-3, and then VOTE on those, looking 
for majority again.

Cheers,
Chris

On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote:

> Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always 
> choose to hardwire HBASE (option D) later.
> 
> Markus
> 
>> Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall
>> we reduce the various options described before to a single one?
>> 
>> Julien
>> 
>> On 15 September 2011 19:55, Markus Jelsma <ma...@openindex.io>wrote:
>>>> Hi Guys,
>>>> 
>>>> I thought I'd chime in on this thread. My comments below:
>>>>> I understand and share your frustration, however you need to bear in
>>> 
>>> mind
>>> 
>>>>> that things are done only if people volunteer and have time - usually
>>>>> taken from their holiday, weekends, evenings. Chris (who is the de
>>> 
>>> facto
>>> 
>>>>> release master for Nutch and Gora) has not had the time and nobody
>>>>> else has volunteered to do it.
>>>> 
>>>> Yep I haven't had the time to push a Gora 0.1.1-incubating release that
>>>> will address the Maven issues. However it is on my roadmap for open
>>> 
>>> source
>>> 
>>>> stuff to get done in the next month, so that's a good thing. But yes,
>>> 
>>> that
>>> 
>>>> portion of my open source work is all volunteer time, so sometimes
>>>> other things take priority.
>>>> 
>>>>>> As it happens, yesterday was the 1 year anniversary of the last
>>>>>> successful Hudson/Jenkins build...  If that actually worked, we
>>>>>> could point people towards it as a useful recipe for how to get a
>>>>>> build working off trunk.  I haven't been following Nutch too
>>>>>> closely, but it always strikes me as really odd, that there's a
>>>>>> nightly build and it doesn't bother anybody that it fails all the
>>>>>> time (and that there isn't a nightly build for the stable
>>>>>> branches).
>>>>> 
>>>>> The real issue behind all this is what we should do with Nutch 2.0.
>>> 
>>> What
>>> 
>>>>> follows is only my opinion and I would love to hear what others have
>>>>> to say on this subject.
>>>>> 
>>>>> Since we (actually mostly Dogacan) wrote 2.0 and delegated the
>>>>> storage
>>> 
>>> to
>>> 
>>>>> Gora, the latter hasn't really taken off since incubation. There have
>>>>> been some modest contributions to it but it does not seem to be used
>>>>> much and there is virtually nothing happening on it in terms of
>>>>> development. More worryingly, the people who initially contributed to
>>> 
>>> it
>>> 
>>>>> are not very active on the project (such is life, new jobs, different
>>>>> projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
>>>>> progress in  the last 12 months : we still have the same bugs, the
>>> 
>>> tests
>>> 
>>>>> do not work, the build has to be done manually etc...
>>>> 
>>>> Yep.
>>>> 
>>>>> At the same time, there has been a new lease of life into Nutch as a
>>>>> whole : there is definitely more activity on the mailing lists, new
>>>>> users, new active committers  etc... and quite a few bugfixes and
>>>>> improvements - most of them backported from what had been done in the
>>>>> trunk and people seem fairly happy with what we can do with 1.4
>>>> 
>>>> Totally agreed. I'm actually not super surprised -- ever since 1.1, I
>>> 
>>> kind
>>> 
>>>> of felt that maintaining a stable 1.X branch of Nutch (in parallel to
>>>> the 2.0 efforts) was really going to pay off since there was renewed
>>>> interest from users in leveraging (and furthermore accepting) the
>>>> nuances of 1.X.
>>>> 
>>>>> So the question is : what shall we do with 2.0? Here are a few
>>>>> possibilities
>>>>> 
>>>>> 
>>>>> a) put some effort into it, fix the bugs and make so that it can be
>>> 
>>> used
>>> 
>>>>> instead of 1.x
>>>>> b) shelve it and leave it for enthusiasts to play with + make 1.x the
>>>>> trunk again
>>>>> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain
>>> 
>>> two
>>> 
>>>>> branches is quite a pain)
>>>>> d) abandon the idea of a neutral storage layer with Gora and hardwire
>>> 
>>> it
>>> 
>>>>> to e.g. HBase
>>>>> 
>>>>> Option (a) has not happened in the last 12 months and I am not very
>>>>> hopeful about it.
>>>>> 
>>>>> What do you guys think?
>>>> 
>>>> I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
>>>> months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is
>>> 
>>> to
>>> 
>>>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we
>>>> get to ~1.6 over the next 6 months and there is still no active
>>>> development
>>> 
>>> on
>>> 
>>>> 2.0, I'd propose we do this at that point in time:
>>>> 
>>>> 1. branch the current trunk as
>>>> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
>>>> latest stable branch (e.g.,
>>>> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
>>> 
>>> *replace*
>>> 
>>>> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
>>>> development on stable becomes active development in trunk and nutchgora
>>>> still exists in case anyone ever resurrects it.
>>>> 
>>>> That way, we give another 6 months to see how it shakes out and
>>> 
>>> potentially
>>> 
>>>> allow for 1 or 2 or 3 more stable releases before switching those over
>>>> to trunk.
>>>> 
>>>> Thoughts?
>>> 
>>> Yes. I don't believe we should wait until january before discussing this
>>> topic
>>> again. I, for example, cannot spend considerable extra time on the issues
>>> i put in 1.4, also due to the fact that it's not entirely stable.
>>> 
>>> There are many things i can write about this topic right now but don't
>>> feel it's neccessary. The choice is difficult and perhaps painful but
>>> when the voting round is opened by our project lead, i will vote for
>>> promoting 1.x back
>>> to trunk.
>>> 
>>> My apologies for my impatience and pessimism.
>>> 
>>>> BTW, I have a couple contributions from my CS572: Search Engines class
>>> 
>>> from
>>> 
>>>> a year ago that I'd love to port into the Nutch stable branch including
>>>> Hubs/Authorities ranking and some other goodies. I'll try and work on
>>>> those over the next few months, I'm just letting everyone know now so I
>>>> don't forget again :-)
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Markus Jelsma <ma...@openindex.io>.
Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always 
choose to hardwire HBASE (option D) later.

Markus

> Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall
> we reduce the various options described before to a single one?
> 
> Julien
> 
> On 15 September 2011 19:55, Markus Jelsma <ma...@openindex.io>wrote:
> > > Hi Guys,
> > > 
> > > I thought I'd chime in on this thread. My comments below:
> > > > I understand and share your frustration, however you need to bear in
> > 
> > mind
> > 
> > > > that things are done only if people volunteer and have time - usually
> > > > taken from their holiday, weekends, evenings. Chris (who is the de
> > 
> > facto
> > 
> > > > release master for Nutch and Gora) has not had the time and nobody
> > > > else has volunteered to do it.
> > > 
> > > Yep I haven't had the time to push a Gora 0.1.1-incubating release that
> > > will address the Maven issues. However it is on my roadmap for open
> > 
> > source
> > 
> > > stuff to get done in the next month, so that's a good thing. But yes,
> > 
> > that
> > 
> > > portion of my open source work is all volunteer time, so sometimes
> > > other things take priority.
> > > 
> > > >> As it happens, yesterday was the 1 year anniversary of the last
> > > >> successful Hudson/Jenkins build...  If that actually worked, we
> > > >> could point people towards it as a useful recipe for how to get a
> > > >> build working off trunk.  I haven't been following Nutch too
> > > >> closely, but it always strikes me as really odd, that there's a
> > > >> nightly build and it doesn't bother anybody that it fails all the
> > > >> time (and that there isn't a nightly build for the stable
> > > >> branches).
> > > > 
> > > > The real issue behind all this is what we should do with Nutch 2.0.
> > 
> > What
> > 
> > > > follows is only my opinion and I would love to hear what others have
> > > > to say on this subject.
> > > > 
> > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the
> > > > storage
> > 
> > to
> > 
> > > > Gora, the latter hasn't really taken off since incubation. There have
> > > > been some modest contributions to it but it does not seem to be used
> > > > much and there is virtually nothing happening on it in terms of
> > > > development. More worryingly, the people who initially contributed to
> > 
> > it
> > 
> > > > are not very active on the project (such is life, new jobs, different
> > > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
> > > > progress in  the last 12 months : we still have the same bugs, the
> > 
> > tests
> > 
> > > > do not work, the build has to be done manually etc...
> > > 
> > > Yep.
> > > 
> > > > At the same time, there has been a new lease of life into Nutch as a
> > > > whole : there is definitely more activity on the mailing lists, new
> > > > users, new active committers  etc... and quite a few bugfixes and
> > > > improvements - most of them backported from what had been done in the
> > > > trunk and people seem fairly happy with what we can do with 1.4
> > > 
> > > Totally agreed. I'm actually not super surprised -- ever since 1.1, I
> > 
> > kind
> > 
> > > of felt that maintaining a stable 1.X branch of Nutch (in parallel to
> > > the 2.0 efforts) was really going to pay off since there was renewed
> > > interest from users in leveraging (and furthermore accepting) the
> > > nuances of 1.X.
> > > 
> > > > So the question is : what shall we do with 2.0? Here are a few
> > > > possibilities
> > > > 
> > > > 
> > > > a) put some effort into it, fix the bugs and make so that it can be
> > 
> > used
> > 
> > > > instead of 1.x
> > > > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> > > > trunk again
> > > > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain
> > 
> > two
> > 
> > > > branches is quite a pain)
> > > > d) abandon the idea of a neutral storage layer with Gora and hardwire
> > 
> > it
> > 
> > > > to e.g. HBase
> > > > 
> > > > Option (a) has not happened in the last 12 months and I am not very
> > > > hopeful about it.
> > > > 
> > > > What do you guys think?
> > > 
> > > I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
> > > months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is
> > 
> > to
> > 
> > > actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we
> > > get to ~1.6 over the next 6 months and there is still no active
> > > development
> > 
> > on
> > 
> > > 2.0, I'd propose we do this at that point in time:
> > > 
> > > 1. branch the current trunk as
> > > https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
> > > latest stable branch (e.g.,
> > > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
> > 
> > *replace*
> > 
> > > the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> > > development on stable becomes active development in trunk and nutchgora
> > > still exists in case anyone ever resurrects it.
> > > 
> > > That way, we give another 6 months to see how it shakes out and
> > 
> > potentially
> > 
> > > allow for 1 or 2 or 3 more stable releases before switching those over
> > > to trunk.
> > > 
> > > Thoughts?
> > 
> > Yes. I don't believe we should wait until january before discussing this
> > topic
> > again. I, for example, cannot spend considerable extra time on the issues
> > i put in 1.4, also due to the fact that it's not entirely stable.
> > 
> > There are many things i can write about this topic right now but don't
> > feel it's neccessary. The choice is difficult and perhaps painful but
> > when the voting round is opened by our project lead, i will vote for
> > promoting 1.x back
> > to trunk.
> > 
> > My apologies for my impatience and pessimism.
> > 
> > > BTW, I have a couple contributions from my CS572: Search Engines class
> > 
> > from
> > 
> > > a year ago that I'd love to port into the Nutch stable branch including
> > > Hubs/Authorities ranking and some other goodies. I'll try and work on
> > > those over the next few months, I'm just letting everyone know now so I
> > > don't forget again :-)
> > > 
> > > Cheers,
> > > Chris
> > > 
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Chris Mattmann, Ph.D.
> > > Senior Computer Scientist
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 171-266B, Mailstop: 171-246
> > > Email: chris.a.mattmann@nasa.gov
> > > WWW:   http://sunset.usc.edu/~mattmann/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Adjunct Assistant Professor, Computer Science Department
> > > University of Southern California, Los Angeles, CA 90089 USA
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Sami Siren <ss...@gmail.com>.
On Thu, Sep 15, 2011 at 9:55 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> There are many things i can write about this topic right now but don't feel
> it's neccessary. The choice is difficult and perhaps painful but when the
> voting round is opened by our project lead, i will vote for promoting 1.x back
> to trunk.

+1, Same here

--
 Sami Siren

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Julien,

I didn't want to skip ship with this one, but it seems that the binding
community has already spoken their mind, and I for one shadow your
suggestion.

It's clear that trunk as it currently exists is not bleeding edge, there
have been too many broken fronts to launch a concentrated code development
attack on that it has simply not happened at all.

We all seem to be using 1.4 well and I am extremely impressed and very happy
with the way development is going. We are making a steady effort as a
community to address issues and the common community interests are usually
being met with reasonable support from anyone who can help out. If anything,
Trunk is a bit of a headache and although some of us want to see it working
(me included), I don't think it is within the communities best interests.

I'm ready for a vote. And yes I think voting should be reduced. Based on
past threads, it seemed to be a bit too complex, and the subsequent outcome
was that nothing was really done and trunk was still broken. Maybe once Gora
has matured a bit Nutch trunk will re-emerge as an attractive model.

Thank you

On Fri, Sep 16, 2011 at 5:26 PM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall
> we reduce the various options described before to a single one?
>
> Julien
>
> On 15 September 2011 19:55, Markus Jelsma <ma...@openindex.io>wrote:
>
>>
>> > Hi Guys,
>> >
>> > I thought I'd chime in on this thread. My comments below:
>> > > I understand and share your frustration, however you need to bear in
>> mind
>> > > that things are done only if people volunteer and have time - usually
>> > > taken from their holiday, weekends, evenings. Chris (who is the de
>> facto
>> > > release master for Nutch and Gora) has not had the time and nobody
>> else
>> > > has volunteered to do it.
>> >
>> > Yep I haven't had the time to push a Gora 0.1.1-incubating release that
>> > will address the Maven issues. However it is on my roadmap for open
>> source
>> > stuff to get done in the next month, so that's a good thing. But yes,
>> that
>> > portion of my open source work is all volunteer time, so sometimes other
>> > things take priority.
>> >
>> > >> As it happens, yesterday was the 1 year anniversary of the last
>> > >> successful Hudson/Jenkins build...  If that actually worked, we could
>> > >> point people towards it as a useful recipe for how to get a build
>> > >> working off trunk.  I haven't been following Nutch too closely, but
>> it
>> > >> always strikes me as really odd, that there's a nightly build and it
>> > >> doesn't bother anybody that it fails all the time (and that there
>> > >> isn't a nightly build for the stable branches).
>> > >
>> > > The real issue behind all this is what we should do with Nutch 2.0.
>> What
>> > > follows is only my opinion and I would love to hear what others have
>> to
>> > > say on this subject.
>> > >
>> > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage
>> to
>> > > Gora, the latter hasn't really taken off since incubation. There have
>> > > been some modest contributions to it but it does not seem to be used
>> > > much and there is virtually nothing happening on it in terms of
>> > > development. More worryingly, the people who initially contributed to
>> it
>> > > are not very active on the project (such is life, new jobs, different
>> > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
>> > > progress in  the last 12 months : we still have the same bugs, the
>> tests
>> > > do not work, the build has to be done manually etc...
>> >
>> > Yep.
>> >
>> > > At the same time, there has been a new lease of life into Nutch as a
>> > > whole : there is definitely more activity on the mailing lists, new
>> > > users, new active committers  etc... and quite a few bugfixes and
>> > > improvements - most of them backported from what had been done in the
>> > > trunk and people seem fairly happy with what we can do with 1.4
>> >
>> > Totally agreed. I'm actually not super surprised -- ever since 1.1, I
>> kind
>> > of felt that maintaining a stable 1.X branch of Nutch (in parallel to
>> the
>> > 2.0 efforts) was really going to pay off since there was renewed
>> interest
>> > from users in leveraging (and furthermore accepting) the nuances of 1.X.
>> >
>> > > So the question is : what shall we do with 2.0? Here are a few
>> > > possibilities
>> > >
>> > >
>> > > a) put some effort into it, fix the bugs and make so that it can be
>> used
>> > > instead of 1.x
>> > > b) shelve it and leave it for enthusiasts to play with + make 1.x the
>> > > trunk again
>> > > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain
>> two
>> > > branches is quite a pain)
>> > > d) abandon the idea of a neutral storage layer with Gora and hardwire
>> it
>> > > to e.g. HBase
>> > >
>> > > Option (a) has not happened in the last 12 months and I am not very
>> > > hopeful about it.
>> > >
>> > > What do you guys think?
>> >
>> > I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
>> > months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is
>> to
>> > actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get
>> > to ~1.6 over the next 6 months and there is still no active development
>> on
>> > 2.0, I'd propose we do this at that point in time:
>> >
>> > 1. branch the current trunk as
>> > https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab
>> latest
>> > stable branch (e.g.,
>> > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
>> *replace*
>> > the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
>> > development on stable becomes active development in trunk and nutchgora
>> > still exists in case anyone ever resurrects it.
>> >
>> > That way, we give another 6 months to see how it shakes out and
>> potentially
>> > allow for 1 or 2 or 3 more stable releases before switching those over
>> to
>> > trunk.
>> >
>> > Thoughts?
>>
>> Yes. I don't believe we should wait until january before discussing this
>> topic
>> again. I, for example, cannot spend considerable extra time on the issues
>> i
>> put in 1.4, also due to the fact that it's not entirely stable.
>>
>> There are many things i can write about this topic right now but don't
>> feel
>> it's neccessary. The choice is difficult and perhaps painful but when the
>> voting round is opened by our project lead, i will vote for promoting 1.x
>> back
>> to trunk.
>>
>> My apologies for my impatience and pessimism.
>>
>> >
>> > BTW, I have a couple contributions from my CS572: Search Engines class
>> from
>> > a year ago that I'd love to port into the Nutch stable branch including
>> > Hubs/Authorities ranking and some other goodies. I'll try and work on
>> > those over the next few months, I'm just letting everyone know now so I
>> > don't forget again :-)
>> >
>> > Cheers,
>> > Chris
>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Chris Mattmann, Ph.D.
>> > Senior Computer Scientist
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 171-266B, Mailstop: 171-246
>> > Email: chris.a.mattmann@nasa.gov
>> > WWW:   http://sunset.usc.edu/~mattmann/
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Adjunct Assistant Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>



-- 
*Lewis*

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Julien Nioche <li...@gmail.com>.
Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall we
reduce the various options described before to a single one?

Julien

On 15 September 2011 19:55, Markus Jelsma <ma...@openindex.io>wrote:

>
> > Hi Guys,
> >
> > I thought I'd chime in on this thread. My comments below:
> > > I understand and share your frustration, however you need to bear in
> mind
> > > that things are done only if people volunteer and have time - usually
> > > taken from their holiday, weekends, evenings. Chris (who is the de
> facto
> > > release master for Nutch and Gora) has not had the time and nobody else
> > > has volunteered to do it.
> >
> > Yep I haven't had the time to push a Gora 0.1.1-incubating release that
> > will address the Maven issues. However it is on my roadmap for open
> source
> > stuff to get done in the next month, so that's a good thing. But yes,
> that
> > portion of my open source work is all volunteer time, so sometimes other
> > things take priority.
> >
> > >> As it happens, yesterday was the 1 year anniversary of the last
> > >> successful Hudson/Jenkins build...  If that actually worked, we could
> > >> point people towards it as a useful recipe for how to get a build
> > >> working off trunk.  I haven't been following Nutch too closely, but it
> > >> always strikes me as really odd, that there's a nightly build and it
> > >> doesn't bother anybody that it fails all the time (and that there
> > >> isn't a nightly build for the stable branches).
> > >
> > > The real issue behind all this is what we should do with Nutch 2.0.
> What
> > > follows is only my opinion and I would love to hear what others have to
> > > say on this subject.
> > >
> > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage
> to
> > > Gora, the latter hasn't really taken off since incubation. There have
> > > been some modest contributions to it but it does not seem to be used
> > > much and there is virtually nothing happening on it in terms of
> > > development. More worryingly, the people who initially contributed to
> it
> > > are not very active on the project (such is life, new jobs, different
> > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
> > > progress in  the last 12 months : we still have the same bugs, the
> tests
> > > do not work, the build has to be done manually etc...
> >
> > Yep.
> >
> > > At the same time, there has been a new lease of life into Nutch as a
> > > whole : there is definitely more activity on the mailing lists, new
> > > users, new active committers  etc... and quite a few bugfixes and
> > > improvements - most of them backported from what had been done in the
> > > trunk and people seem fairly happy with what we can do with 1.4
> >
> > Totally agreed. I'm actually not super surprised -- ever since 1.1, I
> kind
> > of felt that maintaining a stable 1.X branch of Nutch (in parallel to the
> > 2.0 efforts) was really going to pay off since there was renewed interest
> > from users in leveraging (and furthermore accepting) the nuances of 1.X.
> >
> > > So the question is : what shall we do with 2.0? Here are a few
> > > possibilities
> > >
> > >
> > > a) put some effort into it, fix the bugs and make so that it can be
> used
> > > instead of 1.x
> > > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> > > trunk again
> > > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain
> two
> > > branches is quite a pain)
> > > d) abandon the idea of a neutral storage layer with Gora and hardwire
> it
> > > to e.g. HBase
> > >
> > > Option (a) has not happened in the last 12 months and I am not very
> > > hopeful about it.
> > >
> > > What do you guys think?
> >
> > I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
> > months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is
> to
> > actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get
> > to ~1.6 over the next 6 months and there is still no active development
> on
> > 2.0, I'd propose we do this at that point in time:
> >
> > 1. branch the current trunk as
> > https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest
> > stable branch (e.g.,
> > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and
> *replace*
> > the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> > development on stable becomes active development in trunk and nutchgora
> > still exists in case anyone ever resurrects it.
> >
> > That way, we give another 6 months to see how it shakes out and
> potentially
> > allow for 1 or 2 or 3 more stable releases before switching those over to
> > trunk.
> >
> > Thoughts?
>
> Yes. I don't believe we should wait until january before discussing this
> topic
> again. I, for example, cannot spend considerable extra time on the issues i
> put in 1.4, also due to the fact that it's not entirely stable.
>
> There are many things i can write about this topic right now but don't feel
> it's neccessary. The choice is difficult and perhaps painful but when the
> voting round is opened by our project lead, i will vote for promoting 1.x
> back
> to trunk.
>
> My apologies for my impatience and pessimism.
>
> >
> > BTW, I have a couple contributions from my CS572: Search Engines class
> from
> > a year ago that I'd love to port into the Nutch stable branch including
> > Hubs/Authorities ranking and some other goodies. I'll try and work on
> > those over the next few months, I'm just letting everyone know now so I
> > don't forget again :-)
> >
> > Cheers,
> > Chris
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattmann@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Markus Jelsma <ma...@openindex.io>.
> Hi Guys,
> 
> I thought I'd chime in on this thread. My comments below:
> > I understand and share your frustration, however you need to bear in mind
> > that things are done only if people volunteer and have time - usually
> > taken from their holiday, weekends, evenings. Chris (who is the de facto
> > release master for Nutch and Gora) has not had the time and nobody else
> > has volunteered to do it.
> 
> Yep I haven't had the time to push a Gora 0.1.1-incubating release that
> will address the Maven issues. However it is on my roadmap for open source
> stuff to get done in the next month, so that's a good thing. But yes, that
> portion of my open source work is all volunteer time, so sometimes other
> things take priority.
> 
> >> As it happens, yesterday was the 1 year anniversary of the last
> >> successful Hudson/Jenkins build...  If that actually worked, we could
> >> point people towards it as a useful recipe for how to get a build
> >> working off trunk.  I haven't been following Nutch too closely, but it
> >> always strikes me as really odd, that there's a nightly build and it
> >> doesn't bother anybody that it fails all the time (and that there
> >> isn't a nightly build for the stable branches).
> > 
> > The real issue behind all this is what we should do with Nutch 2.0. What
> > follows is only my opinion and I would love to hear what others have to
> > say on this subject.
> > 
> > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> > Gora, the latter hasn't really taken off since incubation. There have
> > been some modest contributions to it but it does not seem to be used
> > much and there is virtually nothing happening on it in terms of
> > development. More worryingly, the people who initially contributed to it
> > are not very active on the project (such is life, new jobs, different
> > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
> > progress in  the last 12 months : we still have the same bugs, the tests
> > do not work, the build has to be done manually etc...
> 
> Yep.
> 
> > At the same time, there has been a new lease of life into Nutch as a
> > whole : there is definitely more activity on the mailing lists, new
> > users, new active committers  etc... and quite a few bugfixes and
> > improvements - most of them backported from what had been done in the
> > trunk and people seem fairly happy with what we can do with 1.4
> 
> Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind
> of felt that maintaining a stable 1.X branch of Nutch (in parallel to the
> 2.0 efforts) was really going to pay off since there was renewed interest
> from users in leveraging (and furthermore accepting) the nuances of 1.X.
> 
> > So the question is : what shall we do with 2.0? Here are a few
> > possibilities
> > 
> > 
> > a) put some effort into it, fix the bugs and make so that it can be used
> > instead of 1.x
> > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> > trunk again
> > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> > branches is quite a pain)
> > d) abandon the idea of a neutral storage layer with Gora and hardwire it
> > to e.g. HBase
> > 
> > Option (a) has not happened in the last 12 months and I am not very
> > hopeful about it.
> > 
> > What do you guys think?
> 
> I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
> months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is to
> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get
> to ~1.6 over the next 6 months and there is still no active development on
> 2.0, I'd propose we do this at that point in time:
> 
> 1. branch the current trunk as
> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest
> stable branch (e.g.,
> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and *replace*
> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> development on stable becomes active development in trunk and nutchgora
> still exists in case anyone ever resurrects it.
> 
> That way, we give another 6 months to see how it shakes out and potentially
> allow for 1 or 2 or 3 more stable releases before switching those over to
> trunk.
> 
> Thoughts?

Yes. I don't believe we should wait until january before discussing this topic 
again. I, for example, cannot spend considerable extra time on the issues i 
put in 1.4, also due to the fact that it's not entirely stable.

There are many things i can write about this topic right now but don't feel 
it's neccessary. The choice is difficult and perhaps painful but when the 
voting round is opened by our project lead, i will vote for promoting 1.x back 
to trunk.

My apologies for my impatience and pessimism.

> 
> BTW, I have a couple contributions from my CS572: Search Engines class from
> a year ago that I'd love to port into the Nutch stable branch including
> Hubs/Authorities ranking and some other goodies. I'll try and work on
> those over the next few months, I'm just letting everyone know now so I
> don't forget again :-)
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Markus Jelsma <ma...@openindex.io>.
> Hi Guys,
> 
> I thought I'd chime in on this thread. My comments below:
> > I understand and share your frustration, however you need to bear in mind
> > that things are done only if people volunteer and have time - usually
> > taken from their holiday, weekends, evenings. Chris (who is the de facto
> > release master for Nutch and Gora) has not had the time and nobody else
> > has volunteered to do it.
> 
> Yep I haven't had the time to push a Gora 0.1.1-incubating release that
> will address the Maven issues. However it is on my roadmap for open source
> stuff to get done in the next month, so that's a good thing. But yes, that
> portion of my open source work is all volunteer time, so sometimes other
> things take priority.
> 
> >> As it happens, yesterday was the 1 year anniversary of the last
> >> successful Hudson/Jenkins build...  If that actually worked, we could
> >> point people towards it as a useful recipe for how to get a build
> >> working off trunk.  I haven't been following Nutch too closely, but it
> >> always strikes me as really odd, that there's a nightly build and it
> >> doesn't bother anybody that it fails all the time (and that there
> >> isn't a nightly build for the stable branches).
> > 
> > The real issue behind all this is what we should do with Nutch 2.0. What
> > follows is only my opinion and I would love to hear what others have to
> > say on this subject.
> > 
> > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> > Gora, the latter hasn't really taken off since incubation. There have
> > been some modest contributions to it but it does not seem to be used
> > much and there is virtually nothing happening on it in terms of
> > development. More worryingly, the people who initially contributed to it
> > are not very active on the project (such is life, new jobs, different
> > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any
> > progress in  the last 12 months : we still have the same bugs, the tests
> > do not work, the build has to be done manually etc...
> 
> Yep.
> 
> > At the same time, there has been a new lease of life into Nutch as a
> > whole : there is definitely more activity on the mailing lists, new
> > users, new active committers  etc... and quite a few bugfixes and
> > improvements - most of them backported from what had been done in the
> > trunk and people seem fairly happy with what we can do with 1.4
> 
> Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind
> of felt that maintaining a stable 1.X branch of Nutch (in parallel to the
> 2.0 efforts) was really going to pay off since there was renewed interest
> from users in leveraging (and furthermore accepting) the nuances of 1.X.
> 
> > So the question is : what shall we do with 2.0? Here are a few
> > possibilities
> > 
> > 
> > a) put some effort into it, fix the bugs and make so that it can be used
> > instead of 1.x
> > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> > trunk again
> > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> > branches is quite a pain)
> > d) abandon the idea of a neutral storage layer with Gora and hardwire it
> > to e.g. HBase
> > 
> > Option (a) has not happened in the last 12 months and I am not very
> > hopeful about it.
> > 
> > What do you guys think?
> 
> I'd suggest an option e). Evolve and keep releasing 1.X over the next 6
> months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is to
> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get
> to ~1.6 over the next 6 months and there is still no active development on
> 2.0, I'd propose we do this at that point in time:
> 
> 1. branch the current trunk as
> https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest
> stable branch (e.g.,
> https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and *replace*
> the Nutch trunk with it, and bump the version # to 1.7-dev 3. active
> development on stable becomes active development in trunk and nutchgora
> still exists in case anyone ever resurrects it.
> 
> That way, we give another 6 months to see how it shakes out and potentially
> allow for 1 or 2 or 3 more stable releases before switching those over to
> trunk.
> 
> Thoughts?

Yes. I don't believe we should wait until january before discussing this topic 
again. I, for example, cannot spend considerable extra time on the issues i 
put in 1.4, also due to the fact that it's not entirely stable.

There are many things i can write about this topic right now but don't feel 
it's neccessary. The choice is difficult and perhaps painful but when the 
voting round is opened by our project lead, i will vote for promoting 1.x back 
to trunk.

My apologies for my impatience and pessimism.

> 
> BTW, I have a couple contributions from my CS572: Search Engines class from
> a year ago that I'd love to port into the Nutch stable branch including
> Hubs/Authorities ranking and some other goodies. I'll try and work on
> those over the next few months, I'm just letting everyone know now so I
> don't forget again :-)
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Guys,

I thought I'd chime in on this thread. My comments below:

> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.

Yep I haven't had the time to push a Gora 0.1.1-incubating release that will 
address the Maven issues. However it is on my roadmap for open source 
stuff to get done in the next month, so that's a good thing. But yes, that portion of 
my open source work is all volunteer time, so sometimes other things take 
priority. 

> 
> 
>> As it happens, yesterday was the 1 year anniversary of the last
>> successful Hudson/Jenkins build...  If that actually worked, we could
>> point people towards it as a useful recipe for how to get a build
>> working off trunk.  I haven't been following Nutch too closely, but it
>> always strikes me as really odd, that there's a nightly build and it
>> doesn't bother anybody that it fails all the time (and that there
>> isn't a nightly build for the stable branches).
>> 
> 
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
> 
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...

Yep.

> 
> At the same time, there has been a new lease of life into Nutch as a whole :
> there is definitely more activity on the mailing lists, new users, new
> active committers  etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4

Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind of felt that 
maintaining a stable 1.X branch of Nutch (in parallel to the 2.0 efforts) was really 
going to pay off since there was renewed interest from users in leveraging 
(and furthermore accepting) the nuances of 1.X.

> 
> So the question is : what shall we do with 2.0? Here are a few possibilities
> :
> 
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> again
> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
> 
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
> 
> What do you guys think?

I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 months, 
and keep 2.0 in the trunk. After 6 months, see how close 1.X is to actually being 
2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get to ~1.6 over the next 6 months 
and there is still no active development on 2.0, I'd propose we do this at that point 
in time:

1. branch the current trunk as https://svn.apache.org/repos/asf/nutch/branches/nutchgora
2. grab latest stable branch (e.g., https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and 
*replace* the Nutch trunk with it, and bump the version # to 1.7-dev
3. active development on stable becomes active development in trunk and nutchgora still 
exists in case anyone ever resurrects it.

That way, we give another 6 months to see how it shakes out and potentially allow for 1 or 2 or 3
more stable releases before switching those over to trunk.

Thoughts?

BTW, I have a couple contributions from my CS572: Search Engines class from a year ago that 
I'd love to port into the Nutch stable branch including Hubs/Authorities ranking and some other 
goodies. I'll try and work on those over the next few months, I'm just letting everyone know now 
so I don't forget again :-)

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Markus Jelsma <ma...@openindex.io>.
Julien, devs, users,

I'd like to see bugs fixed in 2.0 but some of them are way out of my league or 
would cost me an absurd amount of time. I'd also really like to use Gora but 
Gora must be maintained. Gora will play a fundamental role in 2.0 and if 
something is broken there it is not trivial to fix it for us Nutch devs as it 
is yet another component to worry about.

Tika goes well, it's worked on and there is good enough progress to rely on 
from our perspective. If this is not going to be the case with Gora we should 
maybe decide to drop it and hardwire HBASE in it.

Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not 
sure the currently active Nutch devs are going to fix it just like that.

Cheers,

On Tuesday 09 August 2011 17:10:12 Julien Nioche wrote:
> Hi Kirby,
> 
> Grumble, Grumble.  (adding dev@nutch, as that is more than likely
> 
> > where this discussion really belongs)...
> 
> am adding gora-dev@incubator.apache.org as well
> 
> > It'd be really nice if folks could just follow the commands in the
> > nightly build, and get a build pushed out.  I've pointed this out
> > previously, and was told this would be fixed "shortly" (right after
> > GORA-0.1 finally got released, but not published in public maven repo,
> > which as far as I know, it still isn't published, but I stopped
> > checking on it).
> 
> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.
> 
> > As it happens, yesterday was the 1 year anniversary of the last
> > successful Hudson/Jenkins build...  If that actually worked, we could
> > point people towards it as a useful recipe for how to get a build
> > working off trunk.  I haven't been following Nutch too closely, but it
> > always strikes me as really odd, that there's a nightly build and it
> > doesn't bother anybody that it fails all the time (and that there
> > isn't a nightly build for the stable branches).
> 
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
> 
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...
> 
> At the same time, there has been a new lease of life into Nutch as a whole
> : there is definitely more activity on the mailing lists, new users, new
> active committers  etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4
> 
> So the question is : what shall we do with 2.0? Here are a few
> possibilities
> 
> 
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> again
> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
> 
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
> 
> What do you guys think?
> 
> Julien

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by Kirby Bohling <ki...@gmail.com>.
Julien,


On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche <
lists.digitalpebble@gmail.com> wrote:

> Hi Kirby,
>
> Grumble, Grumble.  (adding dev@nutch, as that is more than likely
>> where this discussion really belongs)...
>>
>
> am adding gora-dev@incubator.apache.org as well
>
>
>> It'd be really nice if folks could just follow the commands in the
>> nightly build, and get a build pushed out.  I've pointed this out
>> previously, and was told this would be fixed "shortly" (right after
>> GORA-0.1 finally got released, but not published in public maven repo,
>> which as far as I know, it still isn't published, but I stopped
>> checking on it).
>>
>
> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.
>

   I don't mean to be a complainer, I'd happily try and contribute fixes on
this one, but most of this would likely have to be done on Hudson/Jenkins.
I think you're addressing a larger issue than I really meant.  My point was,
somehow a developer does a build on their desktop, and however that is done
should be duplicated on Hudson/Jenkins.  If you need the trunk of gora, then
is it possible to checkout it out, build it and install it to a local repo,
and then build Nutch via Hudson/Jenkins?  Whatever it takes to get a build
should be what the CI server is doing.  The repeatable, but failing builds
is what really confuses and frustrates me.  The nightly/CI build should be
automating what devs on their desktop to ensure it'll work on a clean
setup.  Right now, it just tells you that for the last year, the totally
obvious steps will lead to a failure.

   I can figure out all of the configuration issues for Hudson/Jenkins to
make it work, if somebody can push that into the Apache version.  However, I
think answering your questions first would be a good idea.  My totally
non-binding +1 for setting up a CI/Nightly build for the various stable
branches too, the only one I found on Apache was for trunk.


>
>> As it happens, yesterday was the 1 year anniversary of the last
>> successful Hudson/Jenkins build...  If that actually worked, we could
>> point people towards it as a useful recipe for how to get a build
>> working off trunk.  I haven't been following Nutch too closely, but it
>> always strikes me as really odd, that there's a nightly build and it
>> doesn't bother anybody that it fails all the time (and that there
>> isn't a nightly build for the stable branches).
>>
>
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
>
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...
>
> At the same time, there has been a new lease of life into Nutch as a whole
> : there is definitely more activity on the mailing lists, new users, new
> active committers  etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4
>
> So the question is : what shall we do with 2.0? Here are a few
> possibilities :
>
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> again
> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
>
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
>
> What do you guys think?
>

   I know nothing about the 2.0 branch, and can't really contribute to that
conversation (that job issue interferes will all my free time).

    Kirby


> Julien
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Guys,

I thought I'd chime in on this thread. My comments below:

> I understand and share your frustration, however you need to bear in mind
> that things are done only if people volunteer and have time - usually taken
> from their holiday, weekends, evenings. Chris (who is the de facto release
> master for Nutch and Gora) has not had the time and nobody else has
> volunteered to do it.

Yep I haven't had the time to push a Gora 0.1.1-incubating release that will 
address the Maven issues. However it is on my roadmap for open source 
stuff to get done in the next month, so that's a good thing. But yes, that portion of 
my open source work is all volunteer time, so sometimes other things take 
priority. 

> 
> 
>> As it happens, yesterday was the 1 year anniversary of the last
>> successful Hudson/Jenkins build...  If that actually worked, we could
>> point people towards it as a useful recipe for how to get a build
>> working off trunk.  I haven't been following Nutch too closely, but it
>> always strikes me as really odd, that there's a nightly build and it
>> doesn't bother anybody that it fails all the time (and that there
>> isn't a nightly build for the stable branches).
>> 
> 
> The real issue behind all this is what we should do with Nutch 2.0. What
> follows is only my opinion and I would love to hear what others have to say
> on this subject.
> 
> Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to
> Gora, the latter hasn't really taken off since incubation. There have been
> some modest contributions to it but it does not seem to be used much and
> there is virtually nothing happening on it in terms of development. More
> worryingly, the people who initially contributed to it are not very active
> on the project (such is life, new jobs, different projects, etc...)
> anymore·. As for Nutch 2.0, it hasn't made any progress in  the last 12
> months : we still have the same bugs, the tests do not work, the build has
> to be done manually etc...

Yep.

> 
> At the same time, there has been a new lease of life into Nutch as a whole :
> there is definitely more activity on the mailing lists, new users, new
> active committers  etc... and quite a few bugfixes and improvements - most
> of them backported from what had been done in the trunk and people seem
> fairly happy with what we can do with 1.4

Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind of felt that 
maintaining a stable 1.X branch of Nutch (in parallel to the 2.0 efforts) was really 
going to pay off since there was renewed interest from users in leveraging 
(and furthermore accepting) the nuances of 1.X.

> 
> So the question is : what shall we do with 2.0? Here are a few possibilities
> :
> 
> a) put some effort into it, fix the bugs and make so that it can be used
> instead of 1.x
> b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk
> again
> c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> branches is quite a pain)
> d) abandon the idea of a neutral storage layer with Gora and hardwire it to
> e.g. HBase
> 
> Option (a) has not happened in the last 12 months and I am not very hopeful
> about it.
> 
> What do you guys think?

I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 months, 
and keep 2.0 in the trunk. After 6 months, see how close 1.X is to actually being 
2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get to ~1.6 over the next 6 months 
and there is still no active development on 2.0, I'd propose we do this at that point 
in time:

1. branch the current trunk as https://svn.apache.org/repos/asf/nutch/branches/nutchgora
2. grab latest stable branch (e.g., https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and 
*replace* the Nutch trunk with it, and bump the version # to 1.7-dev
3. active development on stable becomes active development in trunk and nutchgora still 
exists in case anyone ever resurrects it.

That way, we give another 6 months to see how it shakes out and potentially allow for 1 or 2 or 3
more stable releases before switching those over to trunk.

Thoughts?

BTW, I have a couple contributions from my CS572: Search Engines class from a year ago that 
I'd love to port into the Nutch stable branch including Hubs/Authorities ranking and some other 
goodies. I'll try and work on those over the next few months, I'm just letting everyone know now 
so I don't forget again :-)

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++