You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleg Dulin <ol...@gmail.com> on 2012/09/03 15:25:25 UTC
Text searches and free form queries
Dear Distinguished Colleagues:
I need to add full-text search and somewhat free form queries to my
application. Our data is made up of "items" that are stored in a single
column family, and we have a bunch of secondary indices for look ups.
An item has header fields and data fields, and the structure of the
items CF is a super column family with row-key being item's natural ID,
super column for header, super column for data.
Our application is made up of a several redundant/load balanced servers
all pointing at a Cassandra cluster. Our servers run embedded Jetty.
I need to be able to find items by a combination of field values.
Currently I have an index for items by field value which works
reasonably well. I could also add support for data types and index
items by fields of appropriate types, so we can do range queries on
items.
Ultimately, though, what we want is full text search with suggestions
and human language sensitivity. We want to search by date ranges, by
field values, etc. I did some homework on this topic, and here is what
I see as options:
1) Use an SQL database as a helper. This is rather clunky, not sure
what it gets us since just about anything that can be done in SQL can
be done in Cassandra with proper structures. Then the problem here also
is where am I going to get an open source database that can handle the
workload ? Probably nowhere, nor do I get natural language support.
2) Each of our servers can index data using Lucene, but again we have
to come up with a clunky mechanism where either one of the servers does
the indexing and results are replicated, or each server does its own
indexing.
3) We can use Solr as is, perhaps with some small modifications it can
run within our server JVM -- since we already run embedded Jetty. I
like this idea, actually, but I know that Solr indexing doesn't take
advantage of Cassandra.
4) Datastax Enterprise with search, presumably, supports Solr indexing
of existing column families -- but for the life of me I couldn't figure
out how exactly it does that. The Wikipedia example shows that Solr can
create column families based on Solr schemas that I can then query
using Cassandra itself (which is great) and supposedly I can modify
those column families directly and Solr will reindex them (which is
even better), but I am not sure how that fits into our server design.
The other concern is locking in to a commercial product, something I am
very much worried about.
So, one possibility I can see is using Solr embedded within our own
server solution but storing its indexes in the file system outside of
Cassandra. This is not optimal, and maybe over time i can add my own
support for storing Solr index in Cassandra w/o relying on the Datastax
solution.
In any case, what are your thoughts and experiences ?
Regards,
Oleg
Re: Text searches and free form queries
Posted by Oleg Dulin <ol...@gmail.com>.
>>
>> It works pretty fast.
> Cool.
> Just keep an eye out for how big the lucene token row gets.
> Cheers
>
>
Indeed, it may get out of hand, but for now we are ok -- for the
foreseable future I would say.
Should it get larger, I can split it up into rows -- i.e. all tokens
that start with "a", all tokens that start with "b", etc.
Re: Text searches and free form queries
Posted by aaron morton <aa...@thelastpickle.com>.
> It works pretty fast.
Cool.
Just keep an eye out for how big the lucene token row gets.
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 7/10/2012, at 2:57 AM, Oleg Dulin <ol...@gmail.com> wrote:
> So, what I ended up doing is this --
>
> As I write my records into the main CF, I tokenize some fields that I want to search on using Lucene and write an index into a separate CF, such that my columns are a composite of:
>
> luceneToken:record key
>
> I can then search my records by doing a slice for each lucene token in the search query and then do an intersection of the sets. It works pretty fast.
>
> Regards,
> Oleg
>
> On 2012-09-05 01:28:44 +0000, aaron morton said:
>
> AFAIk if you want to keep it inside cassandra then DSE, roll your own from scratch or start with https://github.com/tjake/Solandra .
>
> Outside of Cassandra I've heard of people using Elastic Search or Solr which I *think* is now faster at updating the index.
>
> Hope that helps.
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/09/2012, at 3:00 AM, Andrey V. Panov <pa...@gmail.com> wrote:
> Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays.
>
> On 3 September 2012 22:25, Oleg Dulin <ol...@gmail.com> wrote:
> Dear Distinguished Colleagues:
>
>
> --
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/
Re: Text searches and free form queries
Posted by Oleg Dulin <ol...@gmail.com>.
So, what I ended up doing is this --
As I write my records into the main CF, I tokenize some fields that I
want to search on using Lucene and write an index into a separate CF,
such that my columns are a composite of:
luceneToken:record key
I can then search my records by doing a slice for each lucene token in
the search query and then do an intersection of the sets. It works
pretty fast.
Regards,
Oleg
On 2012-09-05 01:28:44 +0000, aaron morton said:
> AFAIk if you want to keep it inside cassandra then DSE, roll your own
> from scratch or start with https://github.com/tjake/Solandra .
>
> Outside of Cassandra I've heard of people using Elastic Search or Solr
> which I *think* is now faster at updating the index.
>
> Hope that helps.
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/09/2012, at 3:00 AM, Andrey V. Panov <pa...@gmail.com> wrote:
> Some one did search on Lucene, but for very fresh data they build
> search index in memory so data become available for search without
> delays.
>
> On 3 September 2012 22:25, Oleg Dulin <ol...@gmail.com> wrote:
> Dear Distinguished Colleagues:
--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/
Re: Text searches and free form queries
Posted by aaron morton <aa...@thelastpickle.com>.
AFAIk if you want to keep it inside cassandra then DSE, roll your own from scratch or start with https://github.com/tjake/Solandra .
Outside of Cassandra I've heard of people using Elastic Search or Solr which I *think* is now faster at updating the index.
Hope that helps.
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 4/09/2012, at 3:00 AM, Andrey V. Panov <pa...@gmail.com> wrote:
> Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays.
>
> On 3 September 2012 22:25, Oleg Dulin <ol...@gmail.com> wrote:
> Dear Distinguished Colleagues:
>
Re: Text searches and free form queries
Posted by "Andrey V. Panov" <pa...@gmail.com>.
Some one did search on Lucene, but for very fresh data they build search
index in memory so data become available for search without delays.
On 3 September 2012 22:25, Oleg Dulin <ol...@gmail.com> wrote:
> Dear Distinguished Colleagues:
>
>