You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Asit KAUSHIK <as...@gmail.com> on 2015/01/07 14:40:52 UTC

Are Triggers in Cassandra 2.1.2 performace Hog??

HI All,

We are trying to integrate elasticsearch with Cassandra and as the river
plugin uses select * from any table it seems to be bad performance choice.
So i was thinking of inserting into elasticsearch using Cassandra trigger.
So i wanted your view does a Cassandra Trigger impacts the performance of
read/Write of Cassandra.

Also any other way you guys achieve this please guide me. I am struck on
this .

Regards
Asit

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

Posted by Ryan Svihla <rs...@foundev.pro>.
@Ken So I actually support a lot of the DSE Search users and teach classes
on it, so as long as you're not dropping mutations you're in sync, and if
you're dropping mutations you're probably sized way too small anyway, and
once you run repair (which you should be doing anyway when dropping
mutations) you're back in sync. I actually think because of that the models
work well together.

FWIW the improvement since 3.0 is MASSIVE (it's been what I'd call stable
since 3.2.x and we're on 4.6 now)

@Asit to answer the ES question, it's not really for me to say at all what
the lag will be or to help in advising sizing of ES, so that's probably
more of a question for them.


On Wed, Jan 7, 2015 at 8:56 AM, Asit KAUSHIK <as...@gmail.com>
wrote:

> HI All,
>
> What i intend to do is on every write i would push the code to
> elasticsearch using the Trigger. I know it would impact the Cassandra write
> but  given that the WRITE is pretty performant on Cassandra would that lag
> be a big one.
>
> Also as per my information SOLR  has  limitation of using Nested JSON
> documents  which is elasticsearch does seamlessly and hence it was our
> preference.
>
> Please Let me know about you thought on this as we are struck on this and
> i am looking into Streaming Part of cassandra in hope that i can find
> something
>
> Regards
> Asit
>
>
>
> On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock <ke...@schange.com>
> wrote:
>
>> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
>> same problem that you highlight, no different than your good idea of
>> asynchronously pushing to ES.
>>
>> Each Cassandra write was indexed independently by each server in the
>> replication group.  If a node timed out or a mutation was dropped, that
>> Solr node would have an out-of-sync index.  Doing a solr query such as
>> count(*) users could return inconsistent results depending on which node
>> you hit since solr didn't support Cassandra consistency levels.
>>
>> I haven't seen any blog posts or docs as to whether this intrinsic
>> mismatch between how Cassandra handles eventual consistency and Solr has
>> ever been resolved.
>>
>> Ken
>>
>>
>> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan <do...@gmail.com> wrote:
>>
>>> Be very very careful not to perform blocking calls to ElasticSearch in
>>> your trigger otherwise you will kill C* performance. The biggest danger of
>>> the triggers in their current state is that they are on the write path.
>>>
>>> In your trigger, you can try to push the mutation asynchronously to ES
>>> but in this case it will mean managing a thread pool and all related issues.
>>>
>>> Not even mentioning atomicity issues like: what happen if the update to
>>> ES fails  or the connection times out ? etc ...
>>>
>>> As an alternative, instead of implementing yourself the integration with
>>> ES, you can have a look at Datastax Enterprise integration of Cassandra
>>> with Apache Solr (not free) or some open-source alternatives like Stratio
>>> or TupleJump fork of Cassandra with Lucene integration.
>>>
>>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK <asitkaushiknosql@gmail.com
>>> > wrote:
>>>
>>>> HI All,
>>>>
>>>> We are trying to integrate elasticsearch with Cassandra and as the
>>>> river plugin uses select * from any table it seems to be bad performance
>>>> choice. So i was thinking of inserting into elasticsearch using Cassandra
>>>> trigger.
>>>> So i wanted your view does a Cassandra Trigger impacts the performance
>>>> of read/Write of Cassandra.
>>>>
>>>> Also any other way you guys achieve this please guide me. I am struck
>>>> on this .
>>>>
>>>> Regards
>>>> Asit
>>>>
>>>>
>>>
>>
>>
>>
>>
>


-- 

Thanks,
Ryan Svihla

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

Posted by Asit KAUSHIK <as...@gmail.com>.
HI All,

What i intend to do is on every write i would push the code to
elasticsearch using the Trigger. I know it would impact the Cassandra write
but  given that the WRITE is pretty performant on Cassandra would that lag
be a big one.

Also as per my information SOLR  has  limitation of using Nested JSON
documents  which is elasticsearch does seamlessly and hence it was our
preference.

Please Let me know about you thought on this as we are struck on this and i
am looking into Streaming Part of cassandra in hope that i can find
something

Regards
Asit



On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock <ke...@schange.com> wrote:

> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
> same problem that you highlight, no different than your good idea of
> asynchronously pushing to ES.
>
> Each Cassandra write was indexed independently by each server in the
> replication group.  If a node timed out or a mutation was dropped, that
> Solr node would have an out-of-sync index.  Doing a solr query such as
> count(*) users could return inconsistent results depending on which node
> you hit since solr didn't support Cassandra consistency levels.
>
> I haven't seen any blog posts or docs as to whether this intrinsic
> mismatch between how Cassandra handles eventual consistency and Solr has
> ever been resolved.
>
> Ken
>
>
> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Be very very careful not to perform blocking calls to ElasticSearch in
>> your trigger otherwise you will kill C* performance. The biggest danger of
>> the triggers in their current state is that they are on the write path.
>>
>> In your trigger, you can try to push the mutation asynchronously to ES
>> but in this case it will mean managing a thread pool and all related issues.
>>
>> Not even mentioning atomicity issues like: what happen if the update to
>> ES fails  or the connection times out ? etc ...
>>
>> As an alternative, instead of implementing yourself the integration with
>> ES, you can have a look at Datastax Enterprise integration of Cassandra
>> with Apache Solr (not free) or some open-source alternatives like Stratio
>> or TupleJump fork of Cassandra with Lucene integration.
>>
>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK <as...@gmail.com>
>> wrote:
>>
>>> HI All,
>>>
>>> We are trying to integrate elasticsearch with Cassandra and as the river
>>> plugin uses select * from any table it seems to be bad performance choice.
>>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>>> So i wanted your view does a Cassandra Trigger impacts the performance
>>> of read/Write of Cassandra.
>>>
>>> Also any other way you guys achieve this please guide me. I am struck on
>>> this .
>>>
>>> Regards
>>> Asit
>>>
>>>
>>
>
>
>
>

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

Posted by Jack Krupansky <ja...@gmail.com>.
DSE does now have a queue to decouple Cassandra insert and Solr indexing.
It will block only when/if the queue is filled - you can configure the size
of the queue. So, to be clear, DSE no longer has the highlighted problem
mentioned for ES.

-- Jack Krupansky

On Wed, Jan 7, 2015 at 9:46 AM, Ken Hancock <ke...@schange.com> wrote:

> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
> same problem that you highlight, no different than your good idea of
> asynchronously pushing to ES.
>
> Each Cassandra write was indexed independently by each server in the
> replication group.  If a node timed out or a mutation was dropped, that
> Solr node would have an out-of-sync index.  Doing a solr query such as
> count(*) users could return inconsistent results depending on which node
> you hit since solr didn't support Cassandra consistency levels.
>
> I haven't seen any blog posts or docs as to whether this intrinsic
> mismatch between how Cassandra handles eventual consistency and Solr has
> ever been resolved.
>
> Ken
>
>
> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Be very very careful not to perform blocking calls to ElasticSearch in
>> your trigger otherwise you will kill C* performance. The biggest danger of
>> the triggers in their current state is that they are on the write path.
>>
>> In your trigger, you can try to push the mutation asynchronously to ES
>> but in this case it will mean managing a thread pool and all related issues.
>>
>> Not even mentioning atomicity issues like: what happen if the update to
>> ES fails  or the connection times out ? etc ...
>>
>> As an alternative, instead of implementing yourself the integration with
>> ES, you can have a look at Datastax Enterprise integration of Cassandra
>> with Apache Solr (not free) or some open-source alternatives like Stratio
>> or TupleJump fork of Cassandra with Lucene integration.
>>
>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK <as...@gmail.com>
>> wrote:
>>
>>> HI All,
>>>
>>> We are trying to integrate elasticsearch with Cassandra and as the river
>>> plugin uses select * from any table it seems to be bad performance choice.
>>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>>> So i wanted your view does a Cassandra Trigger impacts the performance
>>> of read/Write of Cassandra.
>>>
>>> Also any other way you guys achieve this please guide me. I am struck on
>>> this .
>>>
>>> Regards
>>> Asit
>>>
>>>
>>
>
>
>
>

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

Posted by Ken Hancock <ke...@schange.com>.
When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
same problem that you highlight, no different than your good idea of
asynchronously pushing to ES.

Each Cassandra write was indexed independently by each server in the
replication group.  If a node timed out or a mutation was dropped, that
Solr node would have an out-of-sync index.  Doing a solr query such as
count(*) users could return inconsistent results depending on which node
you hit since solr didn't support Cassandra consistency levels.

I haven't seen any blog posts or docs as to whether this intrinsic mismatch
between how Cassandra handles eventual consistency and Solr has ever been
resolved.

Ken


On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan <do...@gmail.com> wrote:

> Be very very careful not to perform blocking calls to ElasticSearch in
> your trigger otherwise you will kill C* performance. The biggest danger of
> the triggers in their current state is that they are on the write path.
>
> In your trigger, you can try to push the mutation asynchronously to ES but
> in this case it will mean managing a thread pool and all related issues.
>
> Not even mentioning atomicity issues like: what happen if the update to ES
> fails  or the connection times out ? etc ...
>
> As an alternative, instead of implementing yourself the integration with
> ES, you can have a look at Datastax Enterprise integration of Cassandra
> with Apache Solr (not free) or some open-source alternatives like Stratio
> or TupleJump fork of Cassandra with Lucene integration.
>
> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK <as...@gmail.com>
> wrote:
>
>> HI All,
>>
>> We are trying to integrate elasticsearch with Cassandra and as the river
>> plugin uses select * from any table it seems to be bad performance choice.
>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>> So i wanted your view does a Cassandra Trigger impacts the performance of
>> read/Write of Cassandra.
>>
>> Also any other way you guys achieve this please guide me. I am struck on
>> this .
>>
>> Regards
>> Asit
>>
>>
>

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

Posted by DuyHai Doan <do...@gmail.com>.
Be very very careful not to perform blocking calls to ElasticSearch in your
trigger otherwise you will kill C* performance. The biggest danger of the
triggers in their current state is that they are on the write path.

In your trigger, you can try to push the mutation asynchronously to ES but
in this case it will mean managing a thread pool and all related issues.

Not even mentioning atomicity issues like: what happen if the update to ES
fails  or the connection times out ? etc ...

As an alternative, instead of implementing yourself the integration with
ES, you can have a look at Datastax Enterprise integration of Cassandra
with Apache Solr (not free) or some open-source alternatives like Stratio
or TupleJump fork of Cassandra with Lucene integration.

On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK <as...@gmail.com>
wrote:

> HI All,
>
> We are trying to integrate elasticsearch with Cassandra and as the river
> plugin uses select * from any table it seems to be bad performance choice.
> So i was thinking of inserting into elasticsearch using Cassandra trigger.
> So i wanted your view does a Cassandra Trigger impacts the performance of
> read/Write of Cassandra.
>
> Also any other way you guys achieve this please guide me. I am struck on
> this .
>
> Regards
> Asit
>
>

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
+1.  Don't use triggers.

On Wed, Jan 7, 2015 at 10:49 AM, Robert Coli <rc...@eventbrite.com> wrote:
> On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK <as...@gmail.com>
> wrote:
>>
>> We are trying to integrate elasticsearch with Cassandra and as the river
>> plugin uses select * from any table it seems to be bad performance choice.
>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>> So i wanted your view does a Cassandra Trigger impacts the performance of
>> read/Write of Cassandra.
>
>
> I would not use triggers in production in their current form.
>
> =Rob



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK <as...@gmail.com>
wrote:

> We are trying to integrate elasticsearch with Cassandra and as the river
> plugin uses select * from any table it seems to be bad performance choice.
> So i was thinking of inserting into elasticsearch using Cassandra trigger.
> So i wanted your view does a Cassandra Trigger impacts the performance of
> read/Write of Cassandra.
>

I would not use triggers in production in their current form.

=Rob