You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Octavian Rasnita <or...@gmail.com> on 2011/08/06 20:43:18 UTC

[lucy-user] Stable version?

Hi all,

I am researching for finding the best Perl module I can use to create a search engine, and for the moment I stopped at KinoSearch which looks nice (and in adition I was able to install it under Windows very easy using ppm). But I read that that project moved to Apache Lucy.

So I thought that the best idea would be to also start with Lucy and not KinoSearch. On Lucy POD doc I read that for the moment it is unstable, but that there will be a Lucy1 stable fork which for the moment doesn't exist (and there is no PPM package for it yet and it gives errors when trying to install it under Windows using cpan).

What do you recommend? Go ahead with KinoSearch or better fight some more to use Lucy (because KinoSearch have some unsolved issues, or because it will be discontinued, or for other reasons).

Thanks.

Octavian

PS. I will use the module under Linux, but Windows is the development platform.

Re: [lucy-user] Stable version?

Posted by Octavian Rasnita <or...@gmail.com>.
From: "Marvin Humphrey" <ma...@rectangular.com>
> On Sun, Aug 07, 2011 at 11:56:54PM +0300, Octavian Rasnita wrote:
>> Is Lucy able to index and search for UTF-8 encoded documents? (because if I
>> understood right, KinoSearch can't do that.)
> 
> That problem afflicts KinoSearch 0.1x, but not KinoSearch 0.3x or Lucy.


This is great! I read that KinoSearch 0.3x supports Romanian - the language I need right now, and I hope Lucy also supports it.

I have tried to install KinoSearch 0.3, but I wasn't able to do it using CPAN because it gave the error below, so I will definitely need to wait for the new Lucy release:
error building dll file from 'core/KinoSearch/Test.c' at E:/usr/site/lib/ExtUtils/CBuilder/Platform/Windows.pm line 130, <DATA>
CREAMYG/KinoSearch-0.313.tar.gz 


>> And, is it possible to run 2 or more separate processes in parallel that
>> index new data in the same time?
> 
> There is a single write lock, which is held from Indexer->new through
> Indexer->commit.  Multiple processes attempting to write to the same index
> will likely experience lock contention.
> 
> There are multiple strategies for managing this limitation, such as queuing
> or utilizing Lucy::Index::BackgroundMerger, but the limitation remains.


Well, I made a test and I indexed more than 100.000 documents in less than 10 minutes using KinoSearch 0.1 so it works very fast for indexing too (for my needs).
In conclusion, it is not a big problem that a single process can do the index.

Octavian


Re: [lucy-user] Stable version?

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Aug 07, 2011 at 11:56:54PM +0300, Octavian Rasnita wrote:
> Is Lucy able to index and search for UTF-8 encoded documents? (because if I
> understood right, KinoSearch can't do that.)

That problem afflicts KinoSearch 0.1x, but not KinoSearch 0.3x or Lucy.

> And, is it possible to run 2 or more separate processes in parallel that
> index new data in the same time?

There is a single write lock, which is held from Indexer->new through
Indexer->commit.  Multiple processes attempting to write to the same index
will likely experience lock contention.

There are multiple strategies for managing this limitation, such as queuing
or utilizing Lucy::Index::BackgroundMerger, but the limitation remains.

Marvin Humphrey


Re: [lucy-user] Stable version?

Posted by Octavian Rasnita <or...@gmail.com>.
Hi Marvin,

From: "Marvin Humphrey" <ma...@rectangular.com>
> On Sat, Aug 06, 2011 at 09:43:18PM +0300, Octavian Rasnita wrote:
>> I am researching for finding the best Perl module I can use to create a
>> search engine, and for the moment I stopped at KinoSearch which looks nice
>> (and in adition I was able to install it under Windows very easy using ppm).
>> But I read that that project moved to Apache Lucy.
> 
> Use Lucy.
> 
> KinoSearch is no longer being developed.  Only old versions of KinoSearch --
> the 0.1x line -- are available via PPM, and they contain code 4-5 years behind
> Lucy.
> 
> Apache Lucy (incubating) version 0.2.0, which is likely to be released 3-4
> days from now, addresses numerous portability problems, including several
> affecting Windows.  It has been verified to build and test successfully under
> ActivePerl, Strawberry Perl and Cygwin.


Great! In that case I will try to install it using cpan after the new release.

There are 2 more things I would like to know.
Is Lucy able to index and search for UTF-8 encoded documents? (because if I understood right, KinoSearch can't do that.)
And, is it possible to run 2 or more separate processes in parallel that index new data in the same time?

Thanks.

Octavian


Re: [lucy-user] Stable version?

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Aug 06, 2011 at 09:43:18PM +0300, Octavian Rasnita wrote:
> I am researching for finding the best Perl module I can use to create a
> search engine, and for the moment I stopped at KinoSearch which looks nice
> (and in adition I was able to install it under Windows very easy using ppm).
> But I read that that project moved to Apache Lucy.

Use Lucy.

KinoSearch is no longer being developed.  Only old versions of KinoSearch --
the 0.1x line -- are available via PPM, and they contain code 4-5 years behind
Lucy.

Apache Lucy (incubating) version 0.2.0, which is likely to be released 3-4
days from now, addresses numerous portability problems, including several
affecting Windows.  It has been verified to build and test successfully under
ActivePerl, Strawberry Perl and Cygwin.

> So I thought that the best idea would be to also start with Lucy and not
> KinoSearch. On Lucy POD doc I read that for the moment it is unstable, but
> that there will be a Lucy1 stable fork which for the moment doesn't exist
> (and there is no PPM package for it yet and it gives errors when trying to
> install it under Windows using cpan).

Lucy's code base is mature and solid enough to run in production, though as
the documentation says, it is officially API-unstable.

At some point, we plan to release Lucy1 as a stable fork.  IMO, it could
happen at any time, but as of yet no volunteer has stepped forward to do the
work of preparing the fork.

Best,

Marvin Humphrey