You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Suraj Kumar <su...@inmobi.com> on 2013/10/15 15:53:53 UTC

Sub string search (and complex queries) approach review needed

Hi,

I'd like to enable my users to do sub string search of arbitrary attributes
of documents on-the-fly. Luckily most of the attributes of the documents
are like 'enum' or a finite / small range of values.

How do we achieve the above best? Is it possible to avoid writing any
middleware altogether? How easy would it be to achieve this in erlang,
assuming I'm a completely erlang novice?

I have a 'middleware' approach which I have outlined below. Your inputs
will be highly appreciated on whether you think there is a better approach
than this.

To achieve sub string search on arbitrary attributes on-the-fly, I intend
to write a middle ware API which in combination with a set of view
functions will make concurrent specific-key calls to merge the results and
send them back:

1. Build one view each for those attributes by which I'd like to enable
people to do sub string search: This view will return the list of unique
values for that attribute through a map-reduce.
2. Write a middle ware Search API which will do the following:

   a. given attribute A and substring S as inputs...
   b. call above mentioned view to get unique list of values for attribute
A (ie., call ".../_view/get_unique_values_of_" + A).
   c. Foreach item in above values, find sub set of values where
substr(item, S) = true.
   d. Foreach full_key in subset, make concurrent View API calls with
?key=full_key
   e. Merge results from these 'concurrent streams' in sorted order (and
yes, take advantage of the fact that the results from views are already
sorted for given key) and return them in-situ to caller whenever
appropriate. Assuming the 'gap' between data sets is not large, the middle
ware will more or less buffer no more than GAP number of elements in an
internal buffer before sending the results out. I'm using Node.js for the
middle ware.

The reason I'm building this API is to also make it possible for clients to
potentially also do complex queries later (and/or/etc., compound rules)
because our users demand it. I intend to make the API pseudo-compatible
with CouchDB API ?key="..." (except the string passed as key value will be
a complex and/or rule (like "key1=value1&key2=value2"). Perhaps couchdb is
a bad choice for this kind of a SQL-like querying need... but couchdb
shines at all the other fronts of my requirements that I decided to make-do
with some such approach.

Awaiting valuable feedback from the community.

Regards,

  -Suraj

-- 
An Onion is the Onion skin and the Onion under the skin until the Onion
Skin without any Onion underneath.

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Sub string search (and complex queries) approach review needed

Posted by "E. Kastelijn" <co...@kastelijn.nu>.
Hi Saraj,

I use ElasticSearch for this.
Easy to install, easy to setup.
And for the CouchDB connection you can configure a "river" with only one
HTTP call.

http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/river-couchdb.html 

kind regards, 

   Egon


On Thu, 2013-10-17 at 17:17 +0530, Suraj Kumar wrote:

> lucene has been a pain for us thus far to setup.
> 
> I'm now wondering if it would be a simpler idea to build an elaborate
> 'expression based doc filtering' List function that outputs json of
> documents that match given 'expression' query rule. It's accompanying view
> will essentially index the entire document. Perhaps, it would even output
> complex other things in the future - like sum/etc., Yes it is O(N) search
> with heavy evaluation - but this is OK for now. We can also build a program
> that would generate a static map function with custom name to do 'saved
> search'.
> 
> 
> 
> On Tue, Oct 15, 2013 at 8:21 PM, Jens Alfke <je...@couchbase.com> wrote:
> 
> >
> > On Oct 15, 2013, at 6:53 AM, Suraj Kumar <su...@inmobi.com> wrote:
> >
> > > I'd like to enable my users to do sub string search of arbitrary
> > attributes
> > > of documents on-the-fly. Luckily most of the attributes of the documents
> > > are like 'enum' or a finite / small range of values.
> >
> > Have you considered CouchDB-Lucene? It provides full-text search. Somewhat
> > overkill for your needs, as I understand them, but definitely easier to use
> > than implementing a bunch of code as you’ve described.
> >
> > —Jens
> 
> 
> 
> 
> -- 
> An Onion is the Onion skin and the Onion under the skin until the Onion
> Skin without any Onion underneath.
> 



Re: Sub string search (and complex queries) approach review needed

Posted by Suraj Kumar <su...@inmobi.com>.
lucene has been a pain for us thus far to setup.

I'm now wondering if it would be a simpler idea to build an elaborate
'expression based doc filtering' List function that outputs json of
documents that match given 'expression' query rule. It's accompanying view
will essentially index the entire document. Perhaps, it would even output
complex other things in the future - like sum/etc., Yes it is O(N) search
with heavy evaluation - but this is OK for now. We can also build a program
that would generate a static map function with custom name to do 'saved
search'.



On Tue, Oct 15, 2013 at 8:21 PM, Jens Alfke <je...@couchbase.com> wrote:

>
> On Oct 15, 2013, at 6:53 AM, Suraj Kumar <su...@inmobi.com> wrote:
>
> > I'd like to enable my users to do sub string search of arbitrary
> attributes
> > of documents on-the-fly. Luckily most of the attributes of the documents
> > are like 'enum' or a finite / small range of values.
>
> Have you considered CouchDB-Lucene? It provides full-text search. Somewhat
> overkill for your needs, as I understand them, but definitely easier to use
> than implementing a bunch of code as you’ve described.
>
> —Jens




-- 
An Onion is the Onion skin and the Onion under the skin until the Onion
Skin without any Onion underneath.

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: Sub string search (and complex queries) approach review needed

Posted by Jens Alfke <je...@couchbase.com>.
On Oct 15, 2013, at 6:53 AM, Suraj Kumar <su...@inmobi.com> wrote:

> I'd like to enable my users to do sub string search of arbitrary attributes
> of documents on-the-fly. Luckily most of the attributes of the documents
> are like 'enum' or a finite / small range of values.

Have you considered CouchDB-Lucene? It provides full-text search. Somewhat overkill for your needs, as I understand them, but definitely easier to use than implementing a bunch of code as you’ve described.

—Jens