You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Fábio Caldas <fa...@gmail.com> on 2012/06/22 17:39:51 UTC

Solr and Cassandra

Hi folks,

I´m finalizing tests for a eCommerce Company on Cassandra DSE,
especially on Solr and Cassandra integration.

Yesterday, I understood that Solr Interface
(SERVER:8983/solr/NAMESPACE.COLUMNFAMILY/select/?q=*:*&rows=10) only
show indexed results after Cassandra flush the data.

To check to that conclusion, I did the following steps:

* Runned DSE Log Search Sample to create a Cluster, with Namespace and
ColumnFamily
* Inserted only one entry
* Checked on Solr Interface - 0 Result
* Checked on CQL - 1 Result
* Runned command: nodetool -h localhost flush
* Checked on Solr Interface - 1 Result

I also changed the config on yaml to force flush when reach 1 MB, but
one single data like in my test is very tiny.

I know it sound like very strange but it´s needed since I´m targeting
a near real time index system.

My plans now is to use crontab on my nodes to schedule the nodetool
flush command to run on every minute.

Do you guys see any other approach to check?

-- 
Atenciosamente,
Fábio Caldas

Re: Solr and Cassandra

Posted by Fábio Caldas <fa...@gmail.com>.
Hey Jonathan, thanks a lot. It really solved the problem.

On Fri, Jun 22, 2012 at 3:21 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Instead of manually flushing, you should configure the solr maxTime
> option for autoSoftCommit:
> http://www.datastax.com/docs/datastax_enterprise2.0/search/dse_search_cluster#tuning-performance
>
> On Fri, Jun 22, 2012 at 10:39 AM, Fábio Caldas <fa...@gmail.com> wrote:
>> Hi folks,
>>
>> I´m finalizing tests for a eCommerce Company on Cassandra DSE,
>> especially on Solr and Cassandra integration.
>>
>> Yesterday, I understood that Solr Interface
>> (SERVER:8983/solr/NAMESPACE.COLUMNFAMILY/select/?q=*:*&rows=10) only
>> show indexed results after Cassandra flush the data.
>>
>> To check to that conclusion, I did the following steps:
>>
>> * Runned DSE Log Search Sample to create a Cluster, with Namespace and
>> ColumnFamily
>> * Inserted only one entry
>> * Checked on Solr Interface - 0 Result
>> * Checked on CQL - 1 Result
>> * Runned command: nodetool -h localhost flush
>> * Checked on Solr Interface - 1 Result
>>
>> I also changed the config on yaml to force flush when reach 1 MB, but
>> one single data like in my test is very tiny.
>>
>> I know it sound like very strange but it´s needed since I´m targeting
>> a near real time index system.
>>
>> My plans now is to use crontab on my nodes to schedule the nodetool
>> flush command to run on every minute.
>>
>> Do you guys see any other approach to check?
>>
>> --
>> Atenciosamente,
>> Fábio Caldas
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



-- 
Atenciosamente,
Fábio Caldas

Re: Solr and Cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.
Instead of manually flushing, you should configure the solr maxTime
option for autoSoftCommit:
http://www.datastax.com/docs/datastax_enterprise2.0/search/dse_search_cluster#tuning-performance

On Fri, Jun 22, 2012 at 10:39 AM, Fábio Caldas <fa...@gmail.com> wrote:
> Hi folks,
>
> I´m finalizing tests for a eCommerce Company on Cassandra DSE,
> especially on Solr and Cassandra integration.
>
> Yesterday, I understood that Solr Interface
> (SERVER:8983/solr/NAMESPACE.COLUMNFAMILY/select/?q=*:*&rows=10) only
> show indexed results after Cassandra flush the data.
>
> To check to that conclusion, I did the following steps:
>
> * Runned DSE Log Search Sample to create a Cluster, with Namespace and
> ColumnFamily
> * Inserted only one entry
> * Checked on Solr Interface - 0 Result
> * Checked on CQL - 1 Result
> * Runned command: nodetool -h localhost flush
> * Checked on Solr Interface - 1 Result
>
> I also changed the config on yaml to force flush when reach 1 MB, but
> one single data like in my test is very tiny.
>
> I know it sound like very strange but it´s needed since I´m targeting
> a near real time index system.
>
> My plans now is to use crontab on my nodes to schedule the nodetool
> flush command to run on every minute.
>
> Do you guys see any other approach to check?
>
> --
> Atenciosamente,
> Fábio Caldas



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com