You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by Apache Wiki <wi...@apache.org> on 2008/04/02 13:25:42 UTC

[Couchdb Wiki] Update of "FullTextSearch" by SorenHilmer

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The following page has been changed by SorenHilmer:
http://wiki.apache.org/couchdb/FullTextSearch

New page:
== Fulltext Indexing and Searching ==
CouchDB provides and interface for facilitating
integration of fulltext searchengines. In addition CouchDB suplies 
a reference implementation of this interface using [http://lucene.apache.org Lucene]

=== Index interface ===

CouchDB uses stdio for interfacing to the search engine,whenever a document is changed the name of the database 
containing the document is sent to stdout.

CouchDB does not expect to receive anything on stdin (read it will crash if it does).

==== setup ====

The indexer is started by CouchDB using the commandline specified in
the couch.ini configurationparameter:

{{{
DbUpdateNotificationProcess
}}}


=== Search interface ===

CouchDB again uses stdio to interface to the searcher part.

Currently this interface is not exposed through Futon, so to try it out you need to
start CouchDB with the 
interactive option -i to get an Erlang shell.

>>From there you can write search queries like:

{{{
couch_ft_query:execute("database", "+ query +string").
}}}


For this example the string "database\n" followed by "+ query +string\n" is
transmitted to stdout.

The result of the search is received through stdin and displayed in the shell,
currently the format of this is not specified and left to the
particular search engine.


==== setup ====
The searcher is started by CouchDB using the commandline specified in
the couch.ini configurationparameter:

{{{
FullTextSearchQueryServer
}}}


=== Lucene reference implementation ===

==== Use of special design document ====
A database to index must contain a speciel design document in this format:

{{{
{
  "_id":"_design/fulltextsearch",
  "_rev":"123",
  "fulltext_options": {
    "views": {
      "names" : {"index":"view-value", "return":"document"},
      "cities": {"index":"view-key", "return":"view"}
    }
  }
}
}}}

The Lucene indexer uses the defined views in this document to guide the indexing
process. 

In this example the views "names" and "cities" must also be defined in the database. 
Lucene will index the "view-value" for the "names" view and return documents as
search results, 
for the "cities" view it will index the view-key and return the view in search results.

For info on views in CouchDB see: Self:Views


==== Dependencies ====
The Lucene indexer depends on these projects .jar files to work
 * couchdb4j.jar (see below)
   * commons-beanutils.jar
   * commons-codec-1.3.jar
   * commons-collections.jar
   * commons-httpclient-3.1.jar
   * commons-lang.jar
   * commons-logging-1.1.jar
   * ezmorph-1.0.3.jar
   * json-lib-2.0-jdk15.jar
 * lucene-core-2.3.1.jar

Note: all the couchdb4j dependencies (as you can see some have not
version info supplied) is probably easily checked out from the
couchdb4j repository (see below).

Note: at this time of writing couchdb4j needs to be patched using the patches
specified in issue 6 and 8 
on the coucdb4j issue tracking list: http://code.google.com/p/couchdb4j/issues/list
So checkout trunk patch and build.

At least Java version 5 is needed.

==== Compiling ====
The Lucene searchengine is not build as part of the CouchDB. 

You need to:
 * setup a Java developer environment (at least version 5). 
 * Checkout CouchDB source.
 * Change directory to src/fulltext/lucene
 * Compile using javac with CLASSPATH with the needed dependencies (listed above)
 * Do: jar cf !CouchLucene.jar *.class 

As result you should get a file !CouchLucene.jar to include in your CLASSPATH at
runtime.

==== Runtime setup ====
You need a path to your java runtime (at least version 5).
You have to setup your java CLASSPATH to contain all the .jar files listed in the
dependency list,
alternatively you can specify it on the commandline deifined for the .ini options like:

{{{
FullTextSearchQueryServer=java -cp /path/to/couchdb4j/lib/couchdb4j.jar:...
LuceneSearcher
DbUpdateNotificationProcess=java -cp /path/to/couchdb4j/lib/couchdb4j.jar:...
LuceneIndexer
}}}

Note above example works on Unix like OS's