You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Scott Garland <sc...@gmail.com> on 2010/11/15 22:02:40 UTC

Lucene 3

All,

About a year and a half ago I ported Lucene 2.9.x to C# and kept it up to
date until ~3.0. I didn't plan to ever release the code; however, today I've
done so on Codeplex using the name Lucille
(http://lucille.codeplex.com/<http://lucille.codeplex.com/%20>
).

My primary reason to do this port was to work on the internals of the
indexing and searching engines for both better performance on .NET as well
as deal with some long standing bugs that were preventing me from using
Lucene.Net in high stress applications.

For what you folks on Lucene.Net are doing, there may be some code in what
I've released that will help you get things moving forward with your efforts
to get to 3.0.x. Feel free to take what you need, I've keep the ASF license
in place (I'm not a lawyer...).

I'd be surprised if a new start-over port of Lucene 3.0.x to C# were
anything but a big project. When I did this port I used JLCA and quickly
discovered that there was ton of line by line review, modification and
testing.  During that work I decided to switch to generics, while
maintaining the internals as close to Java Lucene. I tried to predict how
Java Lucene would appear when 3.x was released and the Java folks began
their conversion to Java 1.5 (i.e. Java generics) and therefore my API is
most certainly off from Lucene. There are also some chunks of code that need
to be rewritten for .NET, specifically using WeakReferences for caching,
using Sharpen isn't likely to make that go away.

I stopped tracking Lucene around Sept 2009; just before their switch to Java
1.5. The main reason I stopped was that I had achieved my objectives and had
what I wanted out of the code.

I'd like to address the inevitable question about why I didn't just
contribute this to Lucene.Net. The primary reason is that I wasn't really on
board with the constraints of Lucene.Net, specifically, my interest wasn't
in maintaining compatibility with Lucene. I needed to completely replace the
analysis and parsing code with my own. That said, a lot of the code in this
port looks like what one would want from a Lucene to C# port, you'll see
lots of places where I'm using generics and properties and changing things
to make the entire release more .NET friendly. There's still a long way to
go with that -- but this is one area I'm committed to make even better.

Future work. I'm going to start reviewing the Lucene checkins from Sept 2009
and see what I need to bring this to a semi-official 3.0  release. The API
remains pretty close to Lucene.Net and Java Lucene; but it's not really
1-to-1  and I don't plan to make it so.

My #1 goal when I started this was to have the fastest most reliable search
engine I could get -- I still think that's worth pursuing.

Thanks,
Scott

Some details that you should know:
-- This is current as of Java Lucene 803339 (SVN). This was fairly close the
to the Java Lucene 3.0 release -- one reason I stopped here was I really
didn't like that was happening with Attributes in Java Lucene.

-- This is a VS2010 solution and builds under .NET 3.5. I have built it
under 4.0, but I have a couple of issues that weren't obvious how to
correct.

-- This will *not* compile on .NET 2. I use generics whenever I can, and
LINQ whenever it's appropriate. When I'm adding new code it's .NET 3.5 (this
includes lots of var's).

-- I'm sure it would be possible to create a VS2005 solution. If someone
wants to create one, upload a patch and I'll include it.

-- Unit tests require NUnit and for the most part they pass cleanly, except
for a few that exercise files and some of the subtle problems with threading
that exist in Java vs. .NET There are a few unit tests that don't
consistently fail, they are related to threading and file access. At some
point I need to sit down and dig into those problems -- but the
writer/readers in this are pretty complicated. Unfortunately, too many of
the unit tests are really integration tests and thus the entire suite takes
upwards of 6 minutes to finish.

-- The release includes none of the contrib code that exists in Java or
Lucene.Net.

-- I can't release some of the higher performance internals I have because
of they belong to someone else. This has taken me a few days to excise from
what I'm releasing.

-- I have pretty large list of TODOs. There is a bunch of code in place that
I plan to remove, it exists only the help out when porting new work from
Java Lucene.

-- Name visibility is all over the place; lots of things are "public" that
probably shouldn't be. Some of the mechanical tools I used didn't convert
these correctly, at some point one should go through and start hidding
things that should be hidden.

-- There are lots of port artifacts (Java "byte" vs C# byte/sbyte) left over
that should be corrected.

-- Some code names I've left in place even though I really want to change
them, specifically "Directory", "Attribute" and  "Document".

-- All of the public comments have been stripped out of Java to C#
conversion.