You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Doğacan Güney <do...@gmail.com> on 2010/07/10 15:00:16 UTC

Merging in nutchbase

Hey everyone,

I would like to start merging in nutchbase to trunk, so I am hoping to get
everyone's comments and suggestions on
how to do that.

Some of the other changes in nutchbase (such as deleting nutch's own
indexing system) have already been
incorporated in nutch trunk so I think, the difference between nutchbase and
nutch trunk has been reduced
to scope of NUTCH-650 and NUTCH-811, i.e., abstracting storage away from
nutch.

Unfortunately, AFAICS, there is no easy way to separate NUTCH-650 into
smaller patches. All nutch jobs and all
plugins have to be updated to use the new <String, WebPage> API and it needs
to be done at once. So if no one has any objections,
I want to create a gigantic patch that applies to current trunk and attach
it into NUTCH-650 and commit it soon (I want to do this
quickly because nutch development speed is picking up again, and I am
worried that issues like NUTCH-843, while making perfect sense,
will wreak havoc on my merging efforts :)

What does everyone think?

-- 
Doğacan Güney

Re: Merging in nutchbase

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-07-10 17:01, Doğacan Güney wrote:
> Hey everyone,
>
> On Sat, Jul 10, 2010 at 17:43, Mattmann, Chris A (388J)<
> chris.a.mattmann@jpl.nasa.gov>  wrote:
>
>>   Hey Guys,
>>
>> +1 to Andrzej’s suggestion. I mostly run small scale stuff with Nutch, so
>> unless I can run HBase in small scale (or better yet, an embedded SQL db), I
>> won’t be as much use! :)
>>
>>
> I just want to make clear that this is, indeed, a goal I share. Gora already
> has an SQL backend that can use embedded hsqldb. However, there are some
> weird bugs (I really hate SQL :), but once I am done fixing all bugs (which
> I will be doing today and tomorrow), nutch will run on gora - (embedded
> hsqldb) with zero configuration.

Excellent, that would be a real breakthrough.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Merging in nutchbase

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
That’s b/c you rock, Doğacan, thanks!

Cheers,
Chris


On 7/10/10 8:01 AM, "Doğacan Güney" <do...@gmail.com> wrote:

Hey everyone,

On Sat, Jul 10, 2010 at 17:43, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
Hey Guys,

+1 to Andrzej’s suggestion. I mostly run small scale stuff with Nutch, so unless I can run HBase in small scale (or better yet, an embedded SQL db), I won’t be as much use! :)


I just want to make clear that this is, indeed, a goal I share. Gora already has an SQL backend that can use embedded hsqldb. However, there are some weird bugs (I really hate SQL :), but once I am done fixing all bugs (which I will be doing today and tomorrow), nutch will run on gora - (embedded hsqldb) with zero configuration.

Cheers,
Chris



On 7/10/10 7:28 AM, "Andrzej Bialecki" <ab@getopt.org <ht...@getopt.org> > wrote:

On 2010-07-10 15:24, Julien Nioche wrote:
> I agree with Andrzej that the SQL backend has to be checked and tested on
> nutchbase before we can start porting it to the trunk.

> Moreover I have
> raised an important design issue on the list recently (table per fetchround)
> which needs some changes to Gora first and must be discussed, implemented
> and tested in NutchBase before we port it to trunk

This could go either way, whichever is more convenient - I don't see it
as something to necessarily withhold the merge. Without the first issue,
though, we lose the ability to develop, test and run in local mode...


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov <ht...@jpl.nasa.gov>
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Phone: +1 (818) 354-8810
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Merging in nutchbase

Posted by Doğacan Güney <do...@gmail.com>.
Hey everyone,

On Sat, Jul 10, 2010 at 17:43, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

>  Hey Guys,
>
> +1 to Andrzej’s suggestion. I mostly run small scale stuff with Nutch, so
> unless I can run HBase in small scale (or better yet, an embedded SQL db), I
> won’t be as much use! :)
>
>
I just want to make clear that this is, indeed, a goal I share. Gora already
has an SQL backend that can use embedded hsqldb. However, there are some
weird bugs (I really hate SQL :), but once I am done fixing all bugs (which
I will be doing today and tomorrow), nutch will run on gora - (embedded
hsqldb) with zero configuration.


> Cheers,
> Chris
>
>
>
> On 7/10/10 7:28 AM, "Andrzej Bialecki" <ab...@getopt.org> wrote:
>
> On 2010-07-10 15:24, Julien Nioche wrote:
> > I agree with Andrzej that the SQL backend has to be checked and tested on
> > nutchbase before we can start porting it to the trunk.
>
> > Moreover I have
> > raised an important design issue on the list recently (table per
> fetchround)
> > which needs some changes to Gora first and must be discussed, implemented
> > and tested in NutchBase before we port it to trunk
>
> This could go either way, whichever is more convenient - I don't see it
> as something to necessarily withhold the merge. Without the first issue,
> though, we lose the ability to develop, test and run in local mode...
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: *Chris.Mattmann@jpl.nasa.gov
> *WWW:   *http://sunset.usc.edu/~mattmann/
> *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
Doğacan Güney

Re: Merging in nutchbase

Posted by Julien Nioche <li...@gmail.com>.
Hi Doğacan,

Thanks for the update.


> While I agree with the "table per fetch" issue, I would like to postpone it
> until after the merge. This issue is tricky for a couple of reasons. For
> example, AFAIK, cassandra's latest released version
> does not support live schema updates so you can not add/delete tables on a
> running cassandra machine. I guess we can use super columns as our tables,
> then use columns to store
> data but that may be sub-optimal.
>

Interesting point. OK with your suggestion to postpone it and port the trunk
to GORA after we managed to get nutchbase to work with the SQL backend. My
assumption was that this was doable with all the backends as-is but since
it's not the case it's better to move on and come back to it later. As AB
also pointed out earlier in the discussion we'd need to measure the
performance impact of the 'table per fetch' compared to the current design,
which we could do using Hbase a bit later.

Thanks

J.
-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Re: Merging in nutchbase

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Doğacan,

>> This could go either way, whichever is more convenient - I don't see it as
>> something to necessarily withhold the merge. Without the first issue, though,
>> we lose the ability to develop, test and run in local mode...
>> 
> 
> While I agree with the "table per fetch" issue, I would like to postpone it
> until after the merge. This issue is tricky for a couple of reasons. For
> example, AFAIK, cassandra's latest released version
> does not support live schema updates so you can not add/delete tables on a
> running cassandra machine.
> [...]

I read this too, here:

http://nosql.mypopescu.com/post/789235609/cassandra-and-hbase-compared

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Re: Merging in nutchbase

Posted by Doğacan Güney <do...@gmail.com>.
On Sat, Jul 10, 2010 at 17:28, Andrzej Bialecki <ab...@getopt.org> wrote:

> On 2010-07-10 15:24, Julien Nioche wrote:
>
>> I agree with Andrzej that the SQL backend has to be checked and tested on
>> nutchbase before we can start porting it to the trunk.
>>
>
>  Moreover I have
>> raised an important design issue on the list recently (table per
>> fetchround)
>> which needs some changes to Gora first and must be discussed, implemented
>> and tested in NutchBase before we port it to trunk
>>
>
> This could go either way, whichever is more convenient - I don't see it as
> something to necessarily withhold the merge. Without the first issue,
> though, we lose the ability to develop, test and run in local mode...
>
>
While I agree with the "table per fetch" issue, I would like to postpone it
until after the merge. This issue is tricky for a couple of reasons. For
example, AFAIK, cassandra's latest released version
does not support live schema updates so you can not add/delete tables on a
running cassandra machine. I guess we can use super columns as our tables,
then use columns to store
data but that may be sub-optimal.

For SQL, as mentioned below, it is almost done. There is a weird bug where I
do not read back what I just wrote. Once I figure out what's wrong, I think,
it will be good to go.


> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Doğacan Güney

Re: Merging in nutchbase

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Guys,

+1 to Andrzej's suggestion. I mostly run small scale stuff with Nutch, so unless I can run HBase in small scale (or better yet, an embedded SQL db), I won't be as much use! :)

Cheers,
Chris


On 7/10/10 7:28 AM, "Andrzej Bialecki" <ab...@getopt.org> wrote:

On 2010-07-10 15:24, Julien Nioche wrote:
> I agree with Andrzej that the SQL backend has to be checked and tested on
> nutchbase before we can start porting it to the trunk.

> Moreover I have
> raised an important design issue on the list recently (table per fetchround)
> which needs some changes to Gora first and must be discussed, implemented
> and tested in NutchBase before we port it to trunk

This could go either way, whichever is more convenient - I don't see it
as something to necessarily withhold the merge. Without the first issue,
though, we lose the ability to develop, test and run in local mode...


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Merging in nutchbase

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-07-10 15:24, Julien Nioche wrote:
> I agree with Andrzej that the SQL backend has to be checked and tested on
> nutchbase before we can start porting it to the trunk.

> Moreover I have
> raised an important design issue on the list recently (table per fetchround)
> which needs some changes to Gora first and must be discussed, implemented
> and tested in NutchBase before we port it to trunk

This could go either way, whichever is more convenient - I don't see it 
as something to necessarily withhold the merge. Without the first issue, 
though, we lose the ability to develop, test and run in local mode...


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Merging in nutchbase

Posted by Julien Nioche <li...@gmail.com>.
I agree with Andrzej that the SQL backend has to be checked and tested on
nutchbase before we can start porting it to the trunk. Moreover I have
raised an important design issue on the list recently (table per fetchround)
which needs some changes to Gora first and must be discussed, implemented
and tested in NutchBase before we port it to trunk

J.

On 10 July 2010 14:12, Andrzej Bialecki <ab...@getopt.org> wrote:

> On 2010-07-10 15:00, Doğacan Güney wrote:
>
>> Hey everyone,
>>
>> I would like to start merging in nutchbase to trunk, so I am hoping to get
>> everyone's comments and suggestions on
>> how to do that.
>>
>
>
> Do we have any way to run the merged code without running HBase? I think
> that the SQL backend to Gora needs to be tested first with the nutchbase
> branch - otherwise the development and testing will become very difficult...
> So in my opinion we need to make sure we can use a small SQL backend (Derby
> or HSQL) before we start merging.
>
> As for the mechanics of the patching - yes, I think it needs to be done
> this way.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Re: Merging in nutchbase

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-07-10 15:00, Doğacan Güney wrote:
> Hey everyone,
>
> I would like to start merging in nutchbase to trunk, so I am hoping to get
> everyone's comments and suggestions on
> how to do that.


Do we have any way to run the merged code without running HBase? I think 
that the SQL backend to Gora needs to be tested first with the nutchbase 
branch - otherwise the development and testing will become very 
difficult... So in my opinion we need to make sure we can use a small 
SQL backend (Derby or HSQL) before we start merging.

As for the mechanics of the patching - yes, I think it needs to be done 
this way.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Merging in nutchbase

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
+1 Andrzej, we can call it prenutchbase, and create it immediately prior to merging Nutchbase in. Speaking of which, did we decide to do that this week, once Dogacan finishes getting Gora stable?

Thanks,
Chris



On 7/12/10 8:08 AM, "Andrzej Bialecki" <ab...@getopt.org> wrote:

On 2010-07-12 16:12, Mattmann, Chris A (388J) wrote:
> Hey Julien,
>
> Yep, I think we need to get a few more issues/bugfixes in there and then we can push a 1.2 out...

I think also that we should branch a version of trunk before the
nutchbase merge - we may later delete it if we wish, but Nutch as it is
now is still usable ;) and the merging may take a while...


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Merging in nutchbase

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-07-12 16:12, Mattmann, Chris A (388J) wrote:
> Hey Julien,
>
> Yep, I think we need to get a few more issues/bugfixes in there and then we can push a 1.2 out...

I think also that we should branch a version of trunk before the 
nutchbase merge - we may later delete it if we wish, but Nutch as it is 
now is still usable ;) and the merging may take a while...


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Merging in nutchbase

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Julien,

Yep, I think we need to get a few more issues/bugfixes in there and then we can push a 1.2 out...

Cheers,
Chris



On 7/12/10 7:10 AM, "Julien Nioche" <li...@gmail.com> wrote:

Hi guys,

We'll probably find minor improvements / bugfixes for 1.2 as we port things from NutchBase to trunk so I'd suggest we wait a bit before releasing it.

J.

On 12 July 2010 14:52, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
Hi Alex,

I was thinking of making a 1.2 release. Andrzej and I have been tracking and backporting some issues into that branch, see here:

http://svn.apache.org/repos/asf/nutch/branches/branch-1.2

So far, here are the CHANGES present in that branch:

---
* NUTCH-838 Add timing information to all Tool classes (Jeroen van Vianen, mattmann)

* NUTCH-835 Document deduplication failed using MD5Signature (Sebastian Nagel via ab)

* NUTCH-831 Allow configuration of how fields crawled by Nutch are stored / indexed /
  tokenized (Jeroen van Vianen via mattmann)

* NUTCH-278 Fetcher-status might need clarification: kbit/s instead of kb/s shown (Alex McLintock via mattmann)

* NUTCH-833 Website is still Lucene branded (mattmann, Alex McLintock)

* NUTCH-832 Website menu has lots of broken links - in particular the API docs (Alex McLintock via mattmann)
---

I was waiting to see if there was more general interest from at least one more committer in a point 1.2 release. If there was, I’d be happy to be the RM and push it out. That said, I think that it would likely be the last release in the 1.x series, since I (and the rest of the committers) want to focus our efforts on the 2.0 release.

Cheers,
Chris



On 7/12/10 5:36 AM, "Alex McLintock" <alex.mclintock@gmail.com <ht...@gmail.com> > wrote:

2010/7/10 Doğacan Güney <dogacan@gmail.com <ht...@gmail.com> >:
> I would like to start merging in nutchbase to trunk, so I am hoping to get
> everyone's comments and suggestions on how to do that.


I'd like to know whether we intend to make a 1.12 release at any point
before the nutchbase version 2.0 release.
Presumably we could create a branch in subversion  for this purpose (
based upon the 1.11 release )?

In general I'd prefer to see nutchbase incorporated into head in
chunks - but I don't have any constructive suggestions for how to do
that.

Alex



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov <ht...@jpl.nasa.gov>
WWW:   http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/%7Emattmann/>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Merging in nutchbase

Posted by Julien Nioche <li...@gmail.com>.
Hi guys,

We'll probably find minor improvements / bugfixes for 1.2 as we port things
from NutchBase to trunk so I'd suggest we wait a bit before releasing it.

J.

On 12 July 2010 14:52, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

>  Hi Alex,
>
> I was thinking of making a 1.2 release. Andrzej and I have been tracking
> and backporting some issues into that branch, see here:
>
> http://svn.apache.org/repos/asf/nutch/branches/branch-1.2
>
> So far, here are the CHANGES present in that branch:
>
> ---
> * NUTCH-838 Add timing information to all Tool classes (Jeroen van Vianen,
> mattmann)
>
> * NUTCH-835 Document deduplication failed using MD5Signature (Sebastian
> Nagel via ab)
>
> * NUTCH-831 Allow configuration of how fields crawled by Nutch are stored /
> indexed /
>   tokenized (Jeroen van Vianen via mattmann)
>
> * NUTCH-278 Fetcher-status might need clarification: kbit/s instead of kb/s
> shown (Alex McLintock via mattmann)
>
> * NUTCH-833 Website is still Lucene branded (mattmann, Alex McLintock)
>
> * NUTCH-832 Website menu has lots of broken links - in particular the API
> docs (Alex McLintock via mattmann)
> ---
>
> I was waiting to see if there was more general interest from at least one
> more committer in a point 1.2 release. If there was, I’d be happy to be the
> RM and push it out. That said, I think that it would likely be the last
> release in the 1.x series, since I (and the rest of the committers) want to
> focus our efforts on the 2.0 release.
>
> Cheers,
> Chris
>
>
>
> On 7/12/10 5:36 AM, "Alex McLintock" <al...@gmail.com> wrote:
>
> 2010/7/10 Doğacan Güney <do...@gmail.com>:
> > I would like to start merging in nutchbase to trunk, so I am hoping to
> get
> > everyone's comments and suggestions on how to do that.
>
>
> I'd like to know whether we intend to make a 1.12 release at any point
> before the nutchbase version 2.0 release.
> Presumably we could create a branch in subversion  for this purpose (
> based upon the 1.11 release )?
>
> In general I'd prefer to see nutchbase incorporated into head in
> chunks - but I don't have any constructive suggestions for how to do
> that.
>
> Alex
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: *Chris.Mattmann@jpl.nasa.gov
> *WWW:   *http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> *++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Re: Merging in nutchbase

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Alex,

I was thinking of making a 1.2 release. Andrzej and I have been tracking and backporting some issues into that branch, see here:

http://svn.apache.org/repos/asf/nutch/branches/branch-1.2

So far, here are the CHANGES present in that branch:

---
* NUTCH-838 Add timing information to all Tool classes (Jeroen van Vianen, mattmann)

* NUTCH-835 Document deduplication failed using MD5Signature (Sebastian Nagel via ab)

* NUTCH-831 Allow configuration of how fields crawled by Nutch are stored / indexed /
  tokenized (Jeroen van Vianen via mattmann)

* NUTCH-278 Fetcher-status might need clarification: kbit/s instead of kb/s shown (Alex McLintock via mattmann)

* NUTCH-833 Website is still Lucene branded (mattmann, Alex McLintock)

* NUTCH-832 Website menu has lots of broken links - in particular the API docs (Alex McLintock via mattmann)
---

I was waiting to see if there was more general interest from at least one more committer in a point 1.2 release. If there was, I’d be happy to be the RM and push it out. That said, I think that it would likely be the last release in the 1.x series, since I (and the rest of the committers) want to focus our efforts on the 2.0 release.

Cheers,
Chris


On 7/12/10 5:36 AM, "Alex McLintock" <al...@gmail.com> wrote:

2010/7/10 Doğacan Güney <do...@gmail.com>:
> I would like to start merging in nutchbase to trunk, so I am hoping to get
> everyone's comments and suggestions on how to do that.


I'd like to know whether we intend to make a 1.12 release at any point
before the nutchbase version 2.0 release.
Presumably we could create a branch in subversion  for this purpose (
based upon the 1.11 release )?

In general I'd prefer to see nutchbase incorporated into head in
chunks - but I don't have any constructive suggestions for how to do
that.

Alex



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Merging in nutchbase

Posted by Alex McLintock <al...@gmail.com>.
2010/7/10 Doğacan Güney <do...@gmail.com>:
> I would like to start merging in nutchbase to trunk, so I am hoping to get
> everyone's comments and suggestions on how to do that.


I'd like to know whether we intend to make a 1.12 release at any point
before the nutchbase version 2.0 release.
Presumably we could create a branch in subversion  for this purpose (
based upon the 1.11 release )?

In general I'd prefer to see nutchbase incorporated into head in
chunks - but I don't have any constructive suggestions for how to do
that.

Alex