You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by "jerry@indolo.com" <je...@indolo.com> on 2012/05/06 08:54:48 UTC

[lucy-user] Nutch Index compatibility with Lucy?



I have used Nutch for the last couple of years mostly to maintain an index
and search website. I am looking however to start looking at using Lucy
mostly because of the Perl interface. I wanted to find out if Nutch indexes
will work with Lucy since they are both extended from the Lucene
project. From what I can see, Lucy does not include the crawling/fetching
features of Nutch, but my new site is using all Perl with Catalyst MVC. I
want to move away from maintaining a web server and a Java servlet
container.

Thanks for any information,
Jerry

Re: [lucy-user] Nutch Index compatibility with Lucy?

Posted by Peter Karman <pe...@peknet.com>.
jerry@indolo.com wrote on 5/6/12 1:54 AM:
> 
> 
> 
> I have used Nutch for the last couple of years mostly to maintain an index
> and search website. I am looking however to start looking at using Lucy
> mostly because of the Perl interface. I wanted to find out if Nutch indexes
> will work with Lucy since they are both extended from the Lucene
> project. From what I can see, Lucy does not include the crawling/fetching
> features of Nutch, but my new site is using all Perl with Catalyst MVC. I
> want to move away from maintaining a web server and a Java servlet
> container.

Hi Jerry,

It would be more accurate to say that Lucy is "inspired by" Lucene rather than
derived or based on Lucene. Unlike Plucene or CLucene or any of the other ports,
Lucy has never tried to be index-compatible with Lucene. Only the class
structure and some architectural design is similar to Lucene. Hence the 'loose'
designation.

Swish3[0] -- which is written all in Perl -- provides some of the features of
Nutch, notably a web crawler and document conversion (.pdf, .doc, .xls, etc).

There is a Lucy backend[1] for Swish3.

The Dezi[2] platform gives a REST interface to Swish3 indexes.

Here's an example:

% swish3 -S spider -F lucy -i http://www.peknet.com/ -f dezi.index
% dezi &
% dezi-client -q peknet
--
 uri: http://www.peknet.com/
 title: <b class="h">peknet</b> :: an eddy in the bit stream
 score: 91
========================================
       hits: 1
search time: 0.06957
 build time: 0.14598
      query: peknet



[0] http://swish-e.org/swish3/
[1] https://metacpan.org/module/SWISH::Prog::Lucy
[2] http://dezi.org/

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [lucy-user] Nutch Index compatibility with Lucy?

Posted by Chris Hostetter <ho...@fucit.org>.
: and abandon moving to Perl/Catalyst. It is too bad that all of the Lucene
: based projects are disjointed islands and you are forced to basically stick
: with only one technology once you start.

a) I'm not sure what you mean by "disjointed" .. any project using Lucene 
Core has an index that should be readable by any other project using 
Lucene Core -- there will just be some caveats you have to keep in mind 
(ie: if you want to use Solr with an index you build elsewhere, you have 
to configure Solr with an appropraite schema.xml)

b) Lucy doesn't use Lucene Core - so it's a completley seperate thing

c) The current versions of Nutch, last i heard, do not even build indexes 
directly using the Lucene Core -- instead nutch focuses on the crawling, 
and then uses an indexer to push the crawled/parsed documents to Solr for 
searching -- so you could probably use Nutch with Lucy very easily by 
hooking into that pipeline.  Either replace the "SolrIndexer" in nutch 
with something that writes directly to a Lucy index, or use the 
"SolrIndexer" as is and write a little app that emulates the Solr HTTP 
interface and writes to Lucy...

http://wiki.apache.org/nutch/bin/nutch%20solrindex


-Hoss


Re: [lucy-user] Nutch Index compatibility with Lucy?

Posted by "jerry@indolo.com" <je...@indolo.com>.
Thanks Marvin,

  Since we are early into this project, we may just jump back to Java only
and abandon moving to Perl/Catalyst. It is too bad that all of the Lucene
based projects are disjointed islands and you are forced to basically stick
with only one technology once you start.

Thanks,
Jerry




On May 6, 2012 at 3:32 PM Marvin Humphrey <ma...@rectangular.com> wrote:

> On Sat, May 5, 2012 at 11:54 PM, jerry@indolo.com <je...@indolo.com>
wrote:
> > I wanted to find out if Nutch indexes will work with Lucy since they
are
> > both extended from the Lucene project.
>
> They are not compatible, and there are no plans to establish such
> compatibility.
>
> > From what I can see, Lucy does not include the crawling/fetching
features of
> > Nutch, but my new site is using all Perl with Catalyst MVC. I want to
move
> > away from maintaining a web server and a Java servlet container.
>
> It's a common and worthwhile motivation.  The company I work for is
mainly a
> Perl shop, and switched from Lucene to Lucy.  Having the search library
API in
> the main language of our devs and codebase allowed for faster iteration
during
> development, more seamless and deeper integration, and more nimble
deployment
> and troubleshooting.
>
> Marvin Humphrey

Re: [lucy-user] Nutch Index compatibility with Lucy?

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, May 5, 2012 at 11:54 PM, jerry@indolo.com <je...@indolo.com> wrote:
> I wanted to find out if Nutch indexes will work with Lucy since they are
> both extended from the Lucene project.

They are not compatible, and there are no plans to establish such
compatibility.

> From what I can see, Lucy does not include the crawling/fetching features of
> Nutch, but my new site is using all Perl with Catalyst MVC. I want to move
> away from maintaining a web server and a Java servlet container.

It's a common and worthwhile motivation.  The company I work for is mainly a
Perl shop, and switched from Lucene to Lucy.  Having the search library API in
the main language of our devs and codebase allowed for faster iteration during
development, more seamless and deeper integration, and more nimble deployment
and troubleshooting.

Marvin Humphrey