You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Doğacan Güney <do...@gmail.com> on 2010/06/26 21:04:29 UTC

Re: Nutch 2.0

Hi all,

On Sat, Jun 26, 2010 at 21:26, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi All,
>
> We just wanted to flush some our discussions on Nutch 2.0 to the public
> mailing lists. A few key points:
>
> 1. Doğacan, Julien and Enis are proposing to base Nutch 2.0 on GORA, a
> technology that can be viewed here [1].
>
> The upshot of GORA is that it's an ORM layer that abstracts away backend
> stores like Cassandra or SQL or any other database (though Gora works
> better
> with NoSQL stores).
>
> A couple of questions I've got:
>
>  a) is GORA ASL licensed?
>

I think it is already ASL. Even if it isn't (which means we forgot to do
so),
we have no problems about making it ASL-licensed.


>  b) what's the maintenance plan for GORA? Will it continue to live in
> Github? Will you guys propose it into the Apache Incubator as an ASF
> project?
>
>
That all depends on the amount of interest GORA generates. If there is
interest, we will
be more than happy to propose it as an incubator project.


> 2. Development on Nutchbase has occurred at Github [2] since Doğacan
> originally checked it into Nutch SVN under a branch [3] at the ASF. I
> expressed some concerns about this since it's hard to review huge sweeping
> patches and since a lot of development has occurred off of the public
> Apache
> mailing lists. Specifically I asked Doğacan et al. to enumerate a list of
> the changes in the Git version of Nutchbase [2] versus the ASF version [3].
> We then need to come up with a plan of how to merge the 2 and get the
> latest
> into ASF SVN. Doğacan estimates the difference between the Nutchbase branch
> at the ASF [3] and that of Github [2] to be ~25 hrs of work.  Doğacan
> generated this list of major changes that have happened at Github and not
> at
> Apache:
>

---snip
> 1) Porting nutchbase to GORA: This was discussed in issues NUTCH-808
> and NUTCH-811.
> 2) Using ivy in nutch:  NUTCH-821 and NUTCH-825
> 3) Removal of nutch's custom developed search code (and using SOLR
> instead).
> IIRC, this was also
> discussed and accepted by nutch community. However, if not, we can simply
> put this code back (since
> this was a trivial delete).
> ---snip
>
> So, that really brings everyone up to speed I think. So, that said, I am +1
> for moving forward on #1 above, provided we address the 2 questions I
> listed
> (a+b). We need to understand it from a Nutch perspective. As for #2, we can
> rectify it by doing the following things:
>
>     (a) svn copy NutchBase from GitHub to the nutchbase branch in
> http://svn.apache.org/repos/asf/nutch/branches/nutchbase bringing the ASF
> branch up to date.
>     (b) Once the GORA license issues are figured out (they must be
> compatible with the ASF or we cannot use it), then we update Nutch to
> depend
> on the GORA jars via Ivy?
>     (c) svn tag current Nutch trunk as 1.2-branch
>     (d) svn merge nutchbase branch with nutch trunk
>     (e) roll the version # in nutch trunk to 2.0-dev
>     (f) all issues in JIRA should be updated to reflect 2.0-dev fixes where
> it makes sense
>     (g) a 2.1 version is added to mark anything that we don't want in 2.0
> and we file post 2.0 issues there
>     (h) Nutch 2.0 trunk is fixed, and brought up to speed and old code is
> removed. All unit tests should pass regression where it makes sense.
>     (i) Nutch documentation is brought up to date on wiki and checked into
> SVN
>     (j) We roll a 2.0 release
>
> That sound good to everyone?
>
>
Sounds good.


> Cheers,
> Chris
>
> [1] http://github.com/enis/gora
> [2] http://github.com/dogacan/nutchbase
> [3] http://svn.apache.org/repos/asf/nutch/branches/nutchbase/
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>


-- 
Doğacan Güney