You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/02/17 16:23:03 UTC

[DISCUSS] Nutchgora 2.0 release

Hi Guys,

Here we are again :0)

What are the perceptions with aiming for a 2.0 release? We have one
blocking issue, the webapp, which I got no response from the community at
large about. I would like to see this addressed but this is another issue.

Speaking with the future in mind, we are hoping to get a Gora 0.2 release
out of the door, once a licensing issue is dealt with (the only blocker)
and a few other things. Therefore would it be realistic to aim for a Nutch
2.0 release shortly after that?

My justification for raising this thread again, is that we are seeing
(some) more users interested in this branch/code, I think it is a real
shame that we have not been able to get a release yet. I would really like
to get more people using the code and hopefully getting involved in
identifying bugs, and fixing them if possible.

The question has been open for ages, so I just wonder if anything has
changed now that Gora is doing better as of recent.

Thanks

Lewis

-- 
*Lewis*

Re: [DISCUSS] Nutchgora 2.0 release

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
+1 guys. Just let me know when you are ready and I can RM it.

Cheers,
Chris

On Feb 20, 2012, at 8:01 AM, Lewis John Mcgibbney wrote:

> Hi,
> 
> Not ignoring Chris' comments, but addressing the points below first, please see comments.
> 
> On Mon, Feb 20, 2012 at 2:57 PM, Ferdy Galema <fe...@kalooga.com> wrote:
> Aside from the licensing issue, the only thing I really see as a blocker or as something we need to deal with first is Nutch-1205 (upgrade Gora libs). What are we going to do with that one? 
> I'm going to have another crack with these Ivy resolvers, really quite hard to debug. I can only assume the unresolved dependencies are picked up somewhere upstream! As I said I'm going to try and crack this one maybe today if I get the time.
>  
> 
> About the Nutch API (webapp), my colleague and I have some ideas about how to improve it, in such as way that it is really easy to use. It won't definitely be ready in a upcoming release, especially when there will be a release very soon. Please see the issue[1] for details. I'm not sure what to do with the current webapp implementation, but my suggestion is to to just leave it be as it. (Perhaps mark it as a work-in-progress)
> 
> This sounds really encouraging. Somewhere in my crazy pot of thoughts was to progress with establishing this task as a GSoC project. In reflection, I think it would be excellent if the work could be dev/user community driven as it would cater exactly for what we need and want.
> 
> Please see here for the most up-to-date work I could get in this stuff. I updated it slightly to reflect some recent findings. I'll report back when I get more time on the blocker you mention above.
> 
> http://wiki.apache.org/nutch/NutchAdministrationUserInterface


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: [DISCUSS] Nutchgora 2.0 release

Posted by Ferdy Galema <fe...@kalooga.com>.
Thanks Lewis, that's a real useful link. Updated the jira.

On Mon, Feb 20, 2012 at 5:01 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi,
>
> Not ignoring Chris' comments, but addressing the points below first,
> please see comments.
>
> On Mon, Feb 20, 2012 at 2:57 PM, Ferdy Galema <fe...@kalooga.com>wrote:
>
>> Aside from the licensing issue, the only thing I really see as a blocker
>> or as something we need to deal with first is Nutch-1205 (upgrade Gora
>> libs). What are we going to do with that one?
>>
> I'm going to have another crack with these Ivy resolvers, really quite
> hard to debug. I can only assume the unresolved dependencies are picked up
> somewhere upstream! As I said I'm going to try and crack this one maybe
> today if I get the time.
>
>
>>
>> About the Nutch API (webapp), my colleague and I have some ideas about
>> how to improve it, in such as way that it is really easy to use. It
>> won't definitely be ready in a upcoming release, especially when there will
>> be a release very soon. Please see the issue[1] for details. I'm not sure
>> what to do with the current webapp implementation, but my suggestion is to
>> to just leave it be as it. (Perhaps mark it as a work-in-progress)
>>
>> This sounds really encouraging. Somewhere in my crazy pot of thoughts was
> to progress with establishing this task as a GSoC project. In reflection, I
> think it would be excellent if the work could be dev/user community driven
> as it would cater exactly for what we need and want.
>
> Please see here for the most up-to-date work I could get in this stuff. I
> updated it slightly to reflect some recent findings. I'll report back when
> I get more time on the blocker you mention above.
>
> http://wiki.apache.org/nutch/NutchAdministrationUserInterface
>

Re: [DISCUSS] Nutchgora 2.0 release

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,

Not ignoring Chris' comments, but addressing the points below first, please
see comments.

On Mon, Feb 20, 2012 at 2:57 PM, Ferdy Galema <fe...@kalooga.com>wrote:

> Aside from the licensing issue, the only thing I really see as a blocker
> or as something we need to deal with first is Nutch-1205 (upgrade Gora
> libs). What are we going to do with that one?
>
I'm going to have another crack with these Ivy resolvers, really quite hard
to debug. I can only assume the unresolved dependencies are picked up
somewhere upstream! As I said I'm going to try and crack this one maybe
today if I get the time.


>
> About the Nutch API (webapp), my colleague and I have some ideas about how
> to improve it, in such as way that it is really easy to use. It
> won't definitely be ready in a upcoming release, especially when there will
> be a release very soon. Please see the issue[1] for details. I'm not sure
> what to do with the current webapp implementation, but my suggestion is to
> to just leave it be as it. (Perhaps mark it as a work-in-progress)
>
> This sounds really encouraging. Somewhere in my crazy pot of thoughts was
to progress with establishing this task as a GSoC project. In reflection, I
think it would be excellent if the work could be dev/user community driven
as it would cater exactly for what we need and want.

Please see here for the most up-to-date work I could get in this stuff. I
updated it slightly to reflect some recent findings. I'll report back when
I get more time on the blocker you mention above.

http://wiki.apache.org/nutch/NutchAdministrationUserInterface

Re: [DISCUSS] Nutchgora 2.0 release

Posted by Ferdy Galema <fe...@kalooga.com>.
Hi,

Aside from the licensing issue, the only thing I really see as a blocker or
as something we need to deal with first is Nutch-1205 (upgrade Gora libs).
What are we going to do with that one?

About the Nutch API (webapp), my colleague and I have some ideas about how
to improve it, in such as way that it is really easy to use. It
won't definitely be ready in a upcoming release, especially when there will
be a release very soon. Please see the issue[1] for details. I'm not sure
what to do with the current webapp implementation, but my suggestion is to
to just leave it be as it. (Perhaps mark it as a work-in-progress)

Ferdy.

[1] https://issues.apache.org/jira/browse/NUTCH-1286


On Sat, Feb 18, 2012 at 8:10 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hey Lewis,
>
> I'd be +1 to roll a Nutchgora 2.0 release.
>
> I could see dealing with this in two ways, neither of which I like better
> than the other:
>
> 1. Release the nutchgora branch as "apache-nutch-2.0", and then nutchgora
> becomes
> the 2.0 branch of the system (and we could create branch-2.0) The 1.x
> trunk branch, as it evolves and gets closer to
> 2.0, the last release of it is 1.9, then we do 3.0, which could either be:
>  - a merge or combination of 1.x features and 2.x features
>  - simply the next path for 1.x, and independent of 2.x
>
> 2. Call the artifact, "apache-nutchgora-2.0", independent of the current
> trunk artifact and its release cycle.
>
> Either way, is fine with me.
>
> Cheers,
> Chris
>
> On Feb 17, 2012, at 7:23 AM, Lewis John Mcgibbney wrote:
>
> > Hi Guys,
> >
> > Here we are again :0)
> >
> > What are the perceptions with aiming for a 2.0 release? We have one
> blocking issue, the webapp, which I got no response from the community at
> large about. I would like to see this addressed but this is another issue.
> >
> > Speaking with the future in mind, we are hoping to get a Gora 0.2
> release out of the door, once a licensing issue is dealt with (the only
> blocker) and a few other things. Therefore would it be realistic to aim for
> a Nutch 2.0 release shortly after that?
> >
> > My justification for raising this thread again, is that we are seeing
> (some) more users interested in this branch/code, I think it is a real
> shame that we have not been able to get a release yet. I would really like
> to get more people using the code and hopefully getting involved in
> identifying bugs, and fixing them if possible.
> >
> > The question has been open for ages, so I just wonder if anything has
> changed now that Gora is doing better as of recent.
> >
> > Thanks
> >
> > Lewis
> >
> > --
> > Lewis
> >
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: [DISCUSS] Nutchgora 2.0 release

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Lewis,

I'd be +1 to roll a Nutchgora 2.0 release.

I could see dealing with this in two ways, neither of which I like better than the other:

1. Release the nutchgora branch as "apache-nutch-2.0", and then nutchgora becomes
the 2.0 branch of the system (and we could create branch-2.0) The 1.x trunk branch, as it evolves and gets closer to 
2.0, the last release of it is 1.9, then we do 3.0, which could either be: 
  - a merge or combination of 1.x features and 2.x features
  - simply the next path for 1.x, and independent of 2.x

2. Call the artifact, "apache-nutchgora-2.0", independent of the current trunk artifact and its release cycle.

Either way, is fine with me.

Cheers,
Chris

On Feb 17, 2012, at 7:23 AM, Lewis John Mcgibbney wrote:

> Hi Guys,
> 
> Here we are again :0)
> 
> What are the perceptions with aiming for a 2.0 release? We have one blocking issue, the webapp, which I got no response from the community at large about. I would like to see this addressed but this is another issue.
> 
> Speaking with the future in mind, we are hoping to get a Gora 0.2 release out of the door, once a licensing issue is dealt with (the only blocker) and a few other things. Therefore would it be realistic to aim for a Nutch 2.0 release shortly after that?
> 
> My justification for raising this thread again, is that we are seeing (some) more users interested in this branch/code, I think it is a real shame that we have not been able to get a release yet. I would really like to get more people using the code and hopefully getting involved in identifying bugs, and fixing them if possible.
> 
> The question has been open for ages, so I just wonder if anything has changed now that Gora is doing better as of recent.
> 
> Thanks
> 
> Lewis
> 
> -- 
> Lewis 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++