You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by RONNY <ro...@mputa.com> on 2008/10/22 01:50:22 UTC

Re: Is Nutch Still Active?

Nutch is too young a project to die the men are finalizing version 1.0
Ronny


John Martyniak wrote:
> Hi,
>
> I have been playing around with Nutch for a little while, and I see a 
> ton of emails on the mailing lists, but there hasn't been a formal 
> build in more than a year.
>
> Are there any plans? Is this project still being worked on?
>
> Any thoughts would be greatly appreciated.
>
> -John
>
>


Re: Is Nutch Still Active?

Posted by Dennis Kubes <ku...@apache.org>.

John Martyniak wrote:
> Dennis,
> 
> Thanks for the information.
> 
> Can you tell me what the benefit of integrating with SOLR would be?  It 
> seems to me that the only gap between the two is that Nutch has a 
> Spider, and SOLR has incremental index, query warming, etc.

It is really about size of data and type of usage.  Nutch is 
specifically for web search while Solr is IMO better for enterprise and 
restricted domain search.  Nutch uses MapReduce throughough, Solr 
doesn't (although indexes can be created in MR and served by Solr). 
Nutch has a crawler, Solr doesn't.  Nutch has a distributed search 
server.  Solr is working towards the same type of distributed search 
model.  I think the biggest difference in terms of ideology is web 
search is batch oriented, do a crawl, process, analyze, and index it, 
while enterprise search is closer to real time updates and dynamic changes.

So there are significant differences even though they can work together. 
  The current integration work is to allow indexes created by nutch to 
be served by Solr.  If your domain is creating a full text search from a 
database, or something like radius or location search, I would use Solr. 
  It you want to create a large www or vertical search engine I would 
use Nutch.  If you have a large amount of data to crawl and/or process 
and still want to integrate with a database I would use Nutch / Hadoop 
to acquire and process the data and solr to serve it.

> 
> And the approximate timing of the next release?

Well we were going to release 1.0 when hadoop released 1.0.  They were 
planning on doing that after verison 0.17.  But they have continued 
along the path to version 0.20 so I don't exactly know when a 1.0 
release for hadoop would be.  My guess, although no hard and firm plans 
is within the next 1-2 months.  Many patches are complete now and need 
to be integrated, then let sit for a month or so to work out any bugs.

Dennis

> 
> -John
> 
> On Oct 22, 2008, at 1:29 PM, Dennis Kubes wrote:
> 
>> We have been working on major feature upgrades for version 1.  That 
>> took some time.  It includes things like a new scoring framework, an 
>> new indexing framework, serving search results in XML and JSON, 
>> integration with SOLR and HBase, among others.  Not dead, just busy.
>>
>> Dennis
>>
>> John Martyniak wrote:
>>> Ronny,
>>> Thanks for the info.
>>> Does you know what the approximate timing for that is (Days, weeks, 
>>> months)?  And also the feature set.
>>> -John
>>> On Oct 21, 2008, at 7:50 PM, RONNY wrote:
>>>> Nutch is too young a project to die the men are finalizing version 1.0
>>>> Ronny
>>>>
>>>>
>>>> John Martyniak wrote:
>>>>> Hi,
>>>>>
>>>>> I have been playing around with Nutch for a little while, and I see 
>>>>> a ton of emails on the mailing lists, but there hasn't been a 
>>>>> formal build in more than a year.
>>>>>
>>>>> Are there any plans? Is this project still being worked on?
>>>>>
>>>>> Any thoughts would be greatly appreciated.
>>>>>
>>>>> -John
>>>>>
>>>>>
>>>>
> 

Re: Is Nutch Still Active?

Posted by John Martyniak <jo...@beforedawn.com>.
Dennis,

Thanks for the information.

Can you tell me what the benefit of integrating with SOLR would be?   
It seems to me that the only gap between the two is that Nutch has a  
Spider, and SOLR has incremental index, query warming, etc.

And the approximate timing of the next release?

-John

On Oct 22, 2008, at 1:29 PM, Dennis Kubes wrote:

> We have been working on major feature upgrades for version 1.  That  
> took some time.  It includes things like a new scoring framework, an  
> new indexing framework, serving search results in XML and JSON,  
> integration with SOLR and HBase, among others.  Not dead, just busy.
>
> Dennis
>
> John Martyniak wrote:
>> Ronny,
>> Thanks for the info.
>> Does you know what the approximate timing for that is (Days, weeks,  
>> months)?  And also the feature set.
>> -John
>> On Oct 21, 2008, at 7:50 PM, RONNY wrote:
>>> Nutch is too young a project to die the men are finalizing version  
>>> 1.0
>>> Ronny
>>>
>>>
>>> John Martyniak wrote:
>>>> Hi,
>>>>
>>>> I have been playing around with Nutch for a little while, and I  
>>>> see a ton of emails on the mailing lists, but there hasn't been a  
>>>> formal build in more than a year.
>>>>
>>>> Are there any plans? Is this project still being worked on?
>>>>
>>>> Any thoughts would be greatly appreciated.
>>>>
>>>> -John
>>>>
>>>>
>>>


Re: Is Nutch Still Active?

Posted by Dennis Kubes <ku...@apache.org>.
We have been working on major feature upgrades for version 1.  That took 
some time.  It includes things like a new scoring framework, an new 
indexing framework, serving search results in XML and JSON, integration 
with SOLR and HBase, among others.  Not dead, just busy.

Dennis

John Martyniak wrote:
> Ronny,
> 
> Thanks for the info.
> 
> Does you know what the approximate timing for that is (Days, weeks, 
> months)?  And also the feature set.
> 
> -John
> 
> On Oct 21, 2008, at 7:50 PM, RONNY wrote:
> 
>> Nutch is too young a project to die the men are finalizing version 1.0
>> Ronny
>>
>>
>> John Martyniak wrote:
>>> Hi,
>>>
>>> I have been playing around with Nutch for a little while, and I see a 
>>> ton of emails on the mailing lists, but there hasn't been a formal 
>>> build in more than a year.
>>>
>>> Are there any plans? Is this project still being worked on?
>>>
>>> Any thoughts would be greatly appreciated.
>>>
>>> -John
>>>
>>>
>>
> 

Re: Is Nutch Still Active?

Posted by John Martyniak <jo...@beforedawn.com>.
Ronny,

Thanks for the info.

Does you know what the approximate timing for that is (Days, weeks,  
months)?  And also the feature set.

-John

On Oct 21, 2008, at 7:50 PM, RONNY wrote:

> Nutch is too young a project to die the men are finalizing version 1.0
> Ronny
>
>
> John Martyniak wrote:
>> Hi,
>>
>> I have been playing around with Nutch for a little while, and I see  
>> a ton of emails on the mailing lists, but there hasn't been a  
>> formal build in more than a year.
>>
>> Are there any plans? Is this project still being worked on?
>>
>> Any thoughts would be greatly appreciated.
>>
>> -John
>>
>>
>