You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/12/09 14:49:03 UTC

[REPORT] Lucene December 2009 Board Report

=== Lucene Status Report: December, 2009 ===

TLP

-The PMC added George Aroush and Chris Mattmann to the PMC
-The PMC added Open Relevance committer Robert Muir
-The PMC added Mahout committer Jake Mannix
-The PMC added Tika committer Ken Krugler


LUCENE JAVA

Lucene Java is a search-engine toolkit.  Development has been
active and we released both 2.9 and 3.0 this quarter

SOLR

Solr is a full text search server using Lucene Java.  
Development and the community is active.  Solr released
version 1.4 this quarter.


NUTCH

Nutch is a web-search engine: crawler, indexer and search runtime. There has
been a recent flurry of work on discussing Nutch's future post ApacheCon, 
spearheaded by Andrzej Bialecki and others. In addition, there is ongoing
work on reducing code duplication (tighter integration of the Tika parsing
framework and mime type detection, better Solr integration) and using a
more flexible storage system (e.g. HBase). Many issues are being fixed in
preparation for a 1.1 release early next quarter.

  
LUCY

Lucy is a loose C port of Lucene targeted at dynamic language bindings.
Development this quarter has focused on abstraction of the IO subsystem and
portability to various compiler platforms.



LUCENE.NET 

Lucene.NET is a .NET based port of Lucene Java.  Development and the
community are active.  Lucene.NET graduated from the incubator and is 
now a full-fledged Lucene sub-project.


Mahout

Apache Mahout is working towards
building a suite of scalable machine learning libraries for text and
data mining.  Development is active and version 0.2 was released this
quarter.

Open Relevance Project

The Open Relevance Project is a new project aimed at providing Lucene
and others tools for judging the quality of search and machine
learning approaches.  The project added Robert Muir as a committer
this quarter and development is getting under way. Recent work 
has added support for Indonesian "Tempo" and Persian
"Hamshahri" collection to execute relevance judgements with
lucene-benchmark.

PyLucene

PyLucene is a Python integration of Lucene Java. Development is
active. Closely tracking the Lucene Java releases, we released PyLucene
2.9.0, PyLucene 2.9.1 and PyLucene 3.0.0 this quarter. A major addition was
made to JCC, the code generator making PyLucene possible: the support
for Java generics now in use by Lucene Java 3.0.

TIKA

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.  Tika released version 0.5 this quarter. There have been
recent development efforts to speed up Tika's mime detector, as well as
efforts to provide a self-contained OGSI-based Tika bundle. There is a 
strong desire to release these post 0.5 improvements, so we are planning
to release Tika 0.6 in the next few weeks.