You are viewing a plain text version of this content. The canonical link for it is here.

Posted to ruby-dev@lucene.apache.org by Max Nickel <ma...@oss-institute.org> on 2005/07/07 17:48:15 UTC

rise

Hi all,
Miles Barr and Erik Hatcher posted on my webby and since i wanted to get
in contact with you sooner or later anyway, i'm doing it now :).
Originaly i wanted to wait until i have some more quality code, but
well...
(So if you read something like "it's working" or "i ported", don't take
this to literal please ;))

So let me introduce myself first, i'm Max Nickel and am working on a
project i called Rise, what tries to be a ruby implementation of Lucene.
I just read some of the recent posts on this mailing list, and it seems
that you are concentrating your efforts on getting it done with SWIG, so
i don't know if what i did will be of much use for you.
I took a different approach and first tried a pure ruby implementation.
This was rise-0.1.1 what you also can get on rise.rubyforge.org or on my
outdated Arch repo. At this stage everything was still very buggy and
nowhere what you can call working, but i had enough working to see that
pure ruby simply is unacceptable slow (i expected this to happen
anyway). 
So at this point i decided to port some of the more important parts in
terms of performance to C. I know that this might not be the best
approach when you care about portability or deployment, but i felt that
if you want to do something different then indexing your adressbook it
was necessary.
Right now i have ported following classes either complete or parts of it
as Mixins: 
FS/RAM-IO, Tokenizers upto LowerCaseTokenizer, Term, TermBuffer, Token,
QuickSort, HeapSort, TermInfosWriter#add and #write,
DocumentWriter#writePostings and #addPosition, and SegmentTermEnum +
some helper classes.

The C implementations doesn't use any different headers then ruby.h or
rubyio.h (only once sys/stdlib.h is needed in fsio.c), so everywhere
where ruby compiles, rise should compile also.
Also nearly all classes except the IO ones, aren't pure C, but make use
of ruby's C functions like rb_ivar_*, rb_funcall etc. 

As i wrote in an email to Miles Barr earlier, here are some very rough
indexing stats:
/usr/src/linux of a recent 2.6.12 kernel takes on my machine
with Lucene ~4 Minutes
with Rise in pure ruby > 60 Minutes
with my current Rise/C impl ~20 Minutes.

The current status is unfortunatly broken, since somewhere on my recent
changes i made some stupid mistake and keep getting "Docs out of
order"-Exceptions when merging segments. I havent had much time on my
hand lately to hunt this bug, but i hope it will be the last major one
before 0.1.2 release (except that the searching side is broken as it
isnt updated to the changes i made yet).

Since i was tired of GNU/Archs UI and switched to monotone you also cant
get my current sources. But when i managed to setup my local server i'll
let you know.

kind regards,
/max

Re: rise

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Max - Welcome!!!

I'm literally sitting on the edge of my seat anxious for a viable  
Ruby Lucene, so I applaud your efforts.

I'm very keen on the GCJ/SWIG approach so that the Ruby version can  
stay in sync with the Java version simply by running the build  
process, very much like PyLucene does.  A began a native Ruby port  
once upon a time myself (rucene and rubylucene at RubyForge, with  
very little code but some basic file I/O actually out there) and I  
dropped it once I saw PyLucene and how well it performed.

Hopefully you'd be interested in assisting with the nascent effort  
under way here, or if you come up with something on your own and  
would like to contribute it to Apache to live along side Java Lucene,  
we'd welcome it.

     Erik


On Jul 7, 2005, at 11:48 AM, Max Nickel wrote:

> Hi all,
> Miles Barr and Erik Hatcher posted on my webby and since i wanted  
> to get
> in contact with you sooner or later anyway, i'm doing it now :).
> Originaly i wanted to wait until i have some more quality code, but
> well...
> (So if you read something like "it's working" or "i ported", don't  
> take
> this to literal please ;))
>
> So let me introduce myself first, i'm Max Nickel and am working on a
> project i called Rise, what tries to be a ruby implementation of  
> Lucene.
> I just read some of the recent posts on this mailing list, and it  
> seems
> that you are concentrating your efforts on getting it done with  
> SWIG, so
> i don't know if what i did will be of much use for you.
> I took a different approach and first tried a pure ruby  
> implementation.
> This was rise-0.1.1 what you also can get on rise.rubyforge.org or  
> on my
> outdated Arch repo. At this stage everything was still very buggy and
> nowhere what you can call working, but i had enough working to see  
> that
> pure ruby simply is unacceptable slow (i expected this to happen
> anyway).
> So at this point i decided to port some of the more important parts in
> terms of performance to C. I know that this might not be the best
> approach when you care about portability or deployment, but i felt  
> that
> if you want to do something different then indexing your adressbook it
> was necessary.
> Right now i have ported following classes either complete or parts  
> of it
> as Mixins:
> FS/RAM-IO, Tokenizers upto LowerCaseTokenizer, Term, TermBuffer,  
> Token,
> QuickSort, HeapSort, TermInfosWriter#add and #write,
> DocumentWriter#writePostings and #addPosition, and SegmentTermEnum +
> some helper classes.
>
> The C implementations doesn't use any different headers then ruby.h or
> rubyio.h (only once sys/stdlib.h is needed in fsio.c), so everywhere
> where ruby compiles, rise should compile also.
> Also nearly all classes except the IO ones, aren't pure C, but make  
> use
> of ruby's C functions like rb_ivar_*, rb_funcall etc.
>
> As i wrote in an email to Miles Barr earlier, here are some very rough
> indexing stats:
> /usr/src/linux of a recent 2.6.12 kernel takes on my machine
> with Lucene ~4 Minutes
> with Rise in pure ruby > 60 Minutes
> with my current Rise/C impl ~20 Minutes.
>
> The current status is unfortunatly broken, since somewhere on my  
> recent
> changes i made some stupid mistake and keep getting "Docs out of
> order"-Exceptions when merging segments. I havent had much time on my
> hand lately to hunt this bug, but i hope it will be the last major one
> before 0.1.2 release (except that the searching side is broken as it
> isnt updated to the changes i made yet).
>
> Since i was tired of GNU/Archs UI and switched to monotone you also  
> cant
> get my current sources. But when i managed to setup my local server  
> i'll
> let you know.
>
> kind regards,
> /max
>
>