You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Nicolas Steinmetz <ns...@gmail.com> on 2010/01/17 14:13:01 UTC

Search feature ?

Hi,

Was hacking back on some of my couchdb code and I was wondering how would I
implement some basic search features. All information I can gather around
the web leads me to thinking that there is no basic search within a vanillia
couchdb instance. Search implies some third parties like lucene (couchdb
lucence), xapian & co.

How do you implement some basic search features in your apps ?

Thanks,
Nicolas

-- 
Nicolas Steinmetz
http://www.steinmetz.fr - http://nicolas.steinmetz.fr/

Re: Search feature ?

Posted by Dmitry Unkovsky <oi...@gmail.com>.
On Sun, Jan 17, 2010 at 3:13 PM, Nicolas Steinmetz <ns...@gmail.com> wrote:
>
> How do you implement some basic search features in your apps ?
>

For me couchdb-lucene works pretty well. Also I'm thinking of trying
Sphinx with it's xmlpipe feature.

Cheers,
-- 
DU

Re: Search feature ?

Posted by Zachary Zolton <za...@gmail.com>.
On the downloads page, there's a pre-built JAR for v0.4:

http://github.com/rnewson/couchdb-lucene/downloads

The README linked that page has instructions for unpacking the file.

On Mon, Jan 18, 2010 at 3:47 AM,  <me...@mac.com> wrote:
> I have tried to install couchdb-lucene without success. I use CouchDB 0.11.0b823203 under MacOSX 10.6.2 and followed the 4 steps explained to build couchdb-lucene. I tried with the last version and the v0.4 release (as it seems the 0.4 is the version for CouchDB 0.11) . Last step I suppose I have to configure by updating the /path/to/couch/etc/couchdb/local.ini file, isn't it right?
>
> On the 0.5-SNAPSHOT-dist mvn succeed but the bin/run does not work and I did not find how to start the couchdb-lucene server.
>
> On mvn the v0.4 tests failed:
>  T E S T S
> -------------------------------------------------------
> Running com.github.rnewson.couchdb.lucene.TikaTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.023 sec
> Running org.apache.nutch.analysis.lang.LanguageIdentifierTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.63 sec
> Running com.github.rnewson.couchdb.lucene.LanguageIdentifierTest
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.729 sec
> Running com.github.rnewson.couchdb.lucene.RhinoTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.913 sec
> Running com.github.rnewson.couchdb.lucene.IntegrationTest
> Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 12.183 sec <<< FAILURE!
>
> Results :
>
> Failed tests:
>  index(com.github.rnewson.couchdb.lucene.IntegrationTest)
>  longIndex(com.github.rnewson.couchdb.lucene.IntegrationTest)
>
> Tests run: 27, Failures: 2, Errors: 0, Skipped: 1
>
>
> Any help or detailed doc welcome!
> I would be so happy to finally being able to run couchdb-lucene.
>
> Thanks.
> Joel
>
> Le 18 janv. 2010 à 08:39, Metin Akat a écrit :
>
>> I have never used Solr and I don't know what advantages does it offer
>> over "pure" lucene. But couchdb-lucene is not "pure" lucene. It's a
>> tool written specifically for couchdb. You write search views that are
>> …>
>> Of course, if you are a java expert, you may be comfortable with
>> implementing your own solution, though I don't believe it will be
>> easier than couchdb-lucene.
>

Re: Search feature ?

Posted by Metin Akat <ak...@gmail.com>.
Robert (the author of couchdb-lucene) has published a jar file here
http://github.com/rnewson/couchdb-lucene/downloads
There are instructions on howto unzip the file. I don't know if it
works with 0.11, I'm using 0.10.
Robert recommended me using 0.4, but that was around a month ago.


On Mon, Jan 18, 2010 at 11:47 AM,  <me...@mac.com> wrote:
> I have tried to install couchdb-lucene without success. I use CouchDB 0.11.0b823203 under MacOSX 10.6.2 and followed the 4 steps explained to build couchdb-lucene. I tried with the last version and the v0.4 release (as it seems the 0.4 is the version for CouchDB 0.11) . Last step I suppose I have to configure by updating the /path/to/couch/etc/couchdb/local.ini file, isn't it right?
>
> On the 0.5-SNAPSHOT-dist mvn succeed but the bin/run does not work and I did not find how to start the couchdb-lucene server.
>
> On mvn the v0.4 tests failed:
>  T E S T S
> -------------------------------------------------------
> Running com.github.rnewson.couchdb.lucene.TikaTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.023 sec
> Running org.apache.nutch.analysis.lang.LanguageIdentifierTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.63 sec
> Running com.github.rnewson.couchdb.lucene.LanguageIdentifierTest
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.729 sec
> Running com.github.rnewson.couchdb.lucene.RhinoTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.913 sec
> Running com.github.rnewson.couchdb.lucene.IntegrationTest
> Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 12.183 sec <<< FAILURE!
>
> Results :
>
> Failed tests:
>  index(com.github.rnewson.couchdb.lucene.IntegrationTest)
>  longIndex(com.github.rnewson.couchdb.lucene.IntegrationTest)
>
> Tests run: 27, Failures: 2, Errors: 0, Skipped: 1
>
>
> Any help or detailed doc welcome!
> I would be so happy to finally being able to run couchdb-lucene.
>
> Thanks.
> Joel
>
> Le 18 janv. 2010 à 08:39, Metin Akat a écrit :
>
>> I have never used Solr and I don't know what advantages does it offer
>> over "pure" lucene. But couchdb-lucene is not "pure" lucene. It's a
>> tool written specifically for couchdb. You write search views that are
>> …>
>> Of course, if you are a java expert, you may be comfortable with
>> implementing your own solution, though I don't believe it will be
>> easier than couchdb-lucene.
>

Re: Search feature ?

Posted by me...@mac.com.
I have tried to install couchdb-lucene without success. I use CouchDB 0.11.0b823203 under MacOSX 10.6.2 and followed the 4 steps explained to build couchdb-lucene. I tried with the last version and the v0.4 release (as it seems the 0.4 is the version for CouchDB 0.11) . Last step I suppose I have to configure by updating the /path/to/couch/etc/couchdb/local.ini file, isn't it right?

On the 0.5-SNAPSHOT-dist mvn succeed but the bin/run does not work and I did not find how to start the couchdb-lucene server.

On mvn the v0.4 tests failed:
 T E S T S
-------------------------------------------------------
Running com.github.rnewson.couchdb.lucene.TikaTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.023 sec
Running org.apache.nutch.analysis.lang.LanguageIdentifierTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.63 sec
Running com.github.rnewson.couchdb.lucene.LanguageIdentifierTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.729 sec
Running com.github.rnewson.couchdb.lucene.RhinoTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.913 sec
Running com.github.rnewson.couchdb.lucene.IntegrationTest
Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 12.183 sec <<< FAILURE!

Results :

Failed tests: 
  index(com.github.rnewson.couchdb.lucene.IntegrationTest)
  longIndex(com.github.rnewson.couchdb.lucene.IntegrationTest)

Tests run: 27, Failures: 2, Errors: 0, Skipped: 1


Any help or detailed doc welcome!
I would be so happy to finally being able to run couchdb-lucene.

Thanks.
Joel

Le 18 janv. 2010 à 08:39, Metin Akat a écrit :

> I have never used Solr and I don't know what advantages does it offer
> over "pure" lucene. But couchdb-lucene is not "pure" lucene. It's a
> tool written specifically for couchdb. You write search views that are
> …> 
> Of course, if you are a java expert, you may be comfortable with
> implementing your own solution, though I don't believe it will be
> easier than couchdb-lucene.

Re: Search feature ?

Posted by Metin Akat <ak...@gmail.com>.
I have never used Solr and I don't know what advantages does it offer
over "pure" lucene. But couchdb-lucene is not "pure" lucene. It's a
tool written specifically for couchdb. You write search views that are
almost the same like couchdb views. If you use couchdbkit (don't know
for others), it supports couchdb-lucene so it doesn't matter if you
call couchdb view or lucene view from your code (the API is the same).
I don't know java and couchdb-lucene is perfectly usable as it is
accessed via javascript views.
Of course, if you are a java expert, you may be comfortable with
implementing your own solution, though I don't believe it will be
easier than couchdb-lucene.

On Mon, Jan 18, 2010 at 12:18 AM, Joël Guillod <jo...@mac.com> wrote:
> Le 17 janv. 2010 à 16:52, Metin Akat a écrit :
>
>> Whoever needs something more sophisticated than this, IMHO the easiest
>> way is couchdb-lucene.
>
> Any trials with Solr (http://lucene.apache.org/solr/)?
>
> Since I have already experimented with Solr server (with RoR and REST access appli) I would be more comfortable with it than pure Lucene. One of the other reasons is that I already have the appropriate Solr configuration files (e.g. to deal with french language).
>
> Any clues?
>
> Thanks
> Joel

Re: Search feature ?

Posted by Joël Guillod <jo...@mac.com>.
Le 17 janv. 2010 à 16:52, Metin Akat a écrit :

> Whoever needs something more sophisticated than this, IMHO the easiest
> way is couchdb-lucene.

Any trials with Solr (http://lucene.apache.org/solr/)?

Since I have already experimented with Solr server (with RoR and REST access appli) I would be more comfortable with it than pure Lucene. One of the other reasons is that I already have the appropriate Solr configuration files (e.g. to deal with french language).

Any clues?

Thanks
Joel

Re: Search feature ?

Posted by Nicolas Steinmetz <ns...@gmail.com>.
Hi,

2010/1/17 Metin Akat <ak...@gmail.com>


> Nicolas, in your particular situation you need full text search.
> Nothing less will be satisfactory for you and your users. And in my
> opinion setting up couchdb-lucene is easier than writing
> "fulltext-like" views for couchdb. Go read the docs here
> http://github.com/rnewson/couchdb-lucene/tree/0.4-maint
> If you get in trouble, I could try to help you.
>

I agree that couchdb-lucene would be more efficient. But for one of my
project (the tiny CRM app), I was thinking about using CouchDB instead of an
existing MS Access DB. As the app is to be run on Windows, I would have
liked something really easy to run.

The user of this app does not have internet at work, so I would have liked a
self contained app instead of having to deploy some java software
(jdk+lucene) + ...

So for my own app on a server, I'll give a try to couchdb-lucene. For my
tiny CRM app, maybe I'll use some pure couchdb code depending on
requirements.

Thanks for all your replies,
Nicolas

-- 
Nicolas Steinmetz
http://www.steinmetz.fr - http://nicolas.steinmetz.fr/

Re: Search feature ?

Posted by Metin Akat <ak...@gmail.com>.
>
> That will let you find a title or body which *starts* with a given word or
> text string.
>

That's exactly what I mean.
Whoever needs something more sophisticated than this, IMHO the easiest
way is couchdb-lucene.
Lucene is much better than all the "pure couchdb" tricks. It's faster,
more reliable, the search will be better in terms of the quality of
the search results etc.
The only case I would recommend implementing some "pure couchdb"
solution would be if your app is written in couchapp and relies on
replication to distribute the app itself. Then you can't enforce
everyone to setup lucene.

Nicolas, in your particular situation you need full text search.
Nothing less will be satisfactory for you and your users. And in my
opinion setting up couchdb-lucene is easier than writing
"fulltext-like" views for couchdb. Go read the docs here
http://github.com/rnewson/couchdb-lucene/tree/0.4-maint
If you get in trouble, I could try to help you.

Re: Search feature ?

Posted by Brian Candler <B....@pobox.com>.
On Sun, Jan 17, 2010 at 03:19:59PM +0100, Nicolas Steinmetz wrote:
> For ex, I would like to implement a basic search in a blog app and maybe in
> the futur in a basic CRM app. So for ex if I search for couchdb, I would
> like to find all docs with couchdb in the title or body part in my blog and
> if I look for "Doe", I will find the entry I have for all people named "Doe"
> in my CRM app.
> 
> 
> > For example I use this function to "search" for animals in my database.
> > function (doc) {
> >    if (doc.doc_type == 'Animal') {
> >            emit(doc.ear_mark, doc);
> >            emit(doc.belt, doc);
> >            emit(doc.birth_date, doc);
> >    };
> > }
> > the user starts typing (either of these properties) in a text field
> > and is able to find the animal she is looking for.
> 
> 
> Looks it's what I was looking for.

That will let you find a title or body which *starts* with a given word or
text string.

If you want to search for a word within a title, it's possible to index the
words separately:

function (doc) {
  if (doc.title) {
    emit(doc.title, 'title');
    var words = doc.title.split(/[^a-z0-9]+/i);
    for (var i=1; i<words.length; i++) {
      if (words[i].length >= 3) {
        emit(words[i], 'title part');
      }
    }
  }
}

But for anything more sophisticated than that, you really want a plugin.

Note that couchdb's searching uses UCA collation by default. If the user
enters "Foo" as a search term then you should search for
  startkey="foo"&endkey="FOOZZZZZZZZ"

More info at: http://wiki.apache.org/couchdb/View_collation

Regards,

Brian.

Re: Search feature ?

Posted by Bernd Lutz <ib...@googlemail.com>.
A simple approach is to split ever string you wish to index at spaces. Then emit the keywords to a view.
With a process limit of 5 seconds there are about 20,000 emits per document possible.
I haven't tested it on a large scale project, but it should work out for small cases.


Am 17.01.2010 um 15:19 schrieb Nicolas Steinmetz:

> 2010/1/17 Metin Akat <ak...@gmail.com>
> 
>> Depends on what you mean by "search". If you want "fulltext search",
>> then yes, your best choice so far is couchdb-lucene. But if you just
>> want to find documents by their properties, then you don't need to use
>> some external tool.
>> 
> 
> For ex, I would like to implement a basic search in a blog app and maybe in
> the futur in a basic CRM app. So for ex if I search for couchdb, I would
> like to find all docs with couchdb in the title or body part in my blog and
> if I look for "Doe", I will find the entry I have for all people named "Doe"
> in my CRM app.
> 
> 
>> For example I use this function to "search" for animals in my database.
>> function (doc) {
>>   if (doc.doc_type == 'Animal') {
>>           emit(doc.ear_mark, doc);
>>           emit(doc.belt, doc);
>>           emit(doc.birth_date, doc);
>>   };
>> }
>> the user starts typing (either of these properties) in a text field
>> and is able to find the animal she is looking for.
> 
> 
> Looks it's what I was looking for.
> 
> Thx,
> Nicolas
> 
> -- 
> Nicolas Steinmetz
> http://www.steinmetz.fr - http://nicolas.steinmetz.fr/


Re: Search feature ?

Posted by Nicolas Steinmetz <ns...@gmail.com>.
2010/1/17 Metin Akat <ak...@gmail.com>

> Depends on what you mean by "search". If you want "fulltext search",
> then yes, your best choice so far is couchdb-lucene. But if you just
> want to find documents by their properties, then you don't need to use
> some external tool.
>

For ex, I would like to implement a basic search in a blog app and maybe in
the futur in a basic CRM app. So for ex if I search for couchdb, I would
like to find all docs with couchdb in the title or body part in my blog and
if I look for "Doe", I will find the entry I have for all people named "Doe"
in my CRM app.


> For example I use this function to "search" for animals in my database.
> function (doc) {
>    if (doc.doc_type == 'Animal') {
>            emit(doc.ear_mark, doc);
>            emit(doc.belt, doc);
>            emit(doc.birth_date, doc);
>    };
> }
> the user starts typing (either of these properties) in a text field
> and is able to find the animal she is looking for.


Looks it's what I was looking for.

Thx,
Nicolas

-- 
Nicolas Steinmetz
http://www.steinmetz.fr - http://nicolas.steinmetz.fr/

Re: Search feature ?

Posted by Metin Akat <ak...@gmail.com>.
Depends on what you mean by "search". If you want "fulltext search",
then yes, your best choice so far is couchdb-lucene. But if you just
want to find documents by their properties, then you don't need to use
some external tool.
For example I use this function to "search" for animals in my database.
function (doc) {
    if (doc.doc_type == 'Animal') {
            emit(doc.ear_mark, doc);
            emit(doc.belt, doc);
            emit(doc.birth_date, doc);
    };
}
the user starts typing (either of these properties) in a text field
and is able to find the animal she is looking for.


On Sun, Jan 17, 2010 at 3:13 PM, Nicolas Steinmetz <ns...@gmail.com> wrote:
> Hi,
>
> Was hacking back on some of my couchdb code and I was wondering how would I
> implement some basic search features. All information I can gather around
> the web leads me to thinking that there is no basic search within a vanillia
> couchdb instance. Search implies some third parties like lucene (couchdb
> lucence), xapian & co.
>
> How do you implement some basic search features in your apps ?
>
> Thanks,
> Nicolas
>
> --
> Nicolas Steinmetz
> http://www.steinmetz.fr - http://nicolas.steinmetz.fr/
>

Re: Search feature ?

Posted by Dmitry Unkovsky <oi...@gmail.com>.
On Sun, Jan 17, 2010 at 3:13 PM, Nicolas Steinmetz <ns...@gmail.com> wrote:
>
> How do you implement some basic search features in your apps ?
>

For me couchdb-lucene works pretty well. Also I'm thinking of trying
sphinx with it's xmlpipe feature.

Cheers,
-- 
DU