You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brian ONeill (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/04/09 03:57:18 UTC

[jira] [Issue Comment Edited] (CASSANDRA-1311) Triggers

    [ https://issues.apache.org/jira/browse/CASSANDRA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249647#comment-13249647 ] 

Brian ONeill edited comment on CASSANDRA-1311 at 4/9/12 1:56 AM:
-----------------------------------------------------------------

Agreed.  I don't think we should include REST in the formal API either, just offering that up as a design pattern for those that need to do more than you can fit in a little javascript snippet.  

We are heavy in performance/stress testing right now.  And we now have two models working: one where we use synchronous triggers (prior to write), and another where triggers execute asynchronously after write.  Both are useful for different things. (asynch where we can't slow down the actual write -- e.g. user interactions, and synch when we need to integrity)  

Additionally, we see a need for two levels of guarantees.  For some of the triggers, we don't really care if the trigger failed, because we can rely on a regular map/reduce job to "cleanup" any failed trigger executions.   We'd rather not have the overhead of a CSCL even.  The system just needs to execute the trigger for us (if it can).  If it fails, oh well.  

For other jobs, (synchronous or asynchronous) we need to know when we are in a bad state.  i.e. we need to know if the data is ever out of synch with a side-effect of a trigger.  For these scenarios, the overhead of the CSCL is acceptable. We can see failed trigger executions even in the event of a crash.  (e.g. those log entries left in a PENDING state > some acceptable time period are considered failed and we need to go rectify the situation).  

Unless there are transactional semantics, I think it suffices to have three interception points:
# Pre-mutation synchronous (blocking until trigger execution completes)
#* Trigger can add additional mutations
#** (additional columns to a row "in-transaction" seems useful)
#* Trigger can fail the operation 
#** (quality/integrity checks)
# Post-mutation synchronous
#* Upon failure, we can signal "trigger failure" to the client suggesting retry, but it doesn't fail the actual operation 
#** (since its already happened, and we don't want to add rollback)
# Post-mutation asynchronous
#* No influence on write (obviously), but need to be guaranteed trigger executes, or know when it has not.

For each of these, I think there are two levels of guarantees, either:
# You don't necessarily care if ALL executions were successful, you'd rather be fast 
#* (e.g. statistics / analytics that need to be "close-enough")
# You absolutely need to know if data changed and a trigger was unsuccessful in processing that mutation.

random thoughts,
-brian



                
      was (Author: boneill):
    Agreed.  I don't think we should include REST in the formal API either, just offering that up as a design pattern for those that need to do more than you can fit in a little javascript snippet.  

We are heavy in performance/stress testing right now.  And we now have two models working: one where we use synchronous triggers (prior to write), and another where triggers execute asynchronously after write.  Both are useful for different things. (asynch where we can't slow down the actual write -- e.g. user interactions, and synch when we need to integrity)  

Additionally, we see a need for two levels of guarantees.  For some of the triggers, we don't really care if the trigger failed, because we can rely on a regular map/reduce job to "cleanup" any failed trigger executions.   We'd rather not have the overhead of a CSCL even.  The system just needs to execute the trigger for us (if it can).  If it fails, oh well.  

For other jobs, (synchronous or asynchronous) we need to know when we are in a bad state.  i.e. we need to know if the data is ever out of synch with a side-effect of a trigger.  For these scenarios, the overhead of the CSCL is acceptable. We can see failed trigger executions even in the event of a crash.  (e.g. those log entries left in a PENDING state > some acceptable time period are considered failed and we need to go rectify the situation).  

Unless there are transactional semantics, I think it suffices to have three interception points:
1) Pre-mutation synchronous (blocking until trigger execution completes)
   - Trigger can add additional mutations
     (additional columns to a row "in-transaction" seems useful)
   - Trigger can fail the operation 
     (quality/integrity checks)
2) Post-mutation synchronous
   - Upon failure, we can signal "trigger failure" to the client suggesting retry, but it doesn't fail the actual operation 
     (since its already happened, and we don't want to add rollback)
3) Post-mutation asynchronous
   - No influence on write (obviously), but need to be guaranteed trigger executes, or know when it has not.

For each of these, I think there are two levels of guarantees, either:
1) You don't necessarily care if ALL executions were successful, you'd rather be fast 
   (e.g. statistics / analytics that need to be "close-enough")
2) You absolutely need to know if data changed and a trigger was unsuccessful in processing that mutation.

random thoughts,
-brian



                  
> Triggers
> --------
>
>                 Key: CASSANDRA-1311
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1311
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Maxim Grinev
>             Fix For: 1.2
>
>         Attachments: HOWTO-PatchAndRunTriggerExample-update1.txt, HOWTO-PatchAndRunTriggerExample.txt, ImplementationDetails-update1.pdf, ImplementationDetails.pdf, trunk-967053.txt, trunk-984391-update1.txt, trunk-984391-update2.txt
>
>
> Asynchronous triggers is a basic mechanism to implement various use cases of asynchronous execution of application code at database side. For example to support indexes and materialized views, online analytics, push-based data propagation.
> Please find the motivation, triggers description and list of applications:
> http://maxgrinev.com/2010/07/23/extending-cassandra-with-asynchronous-triggers/
> An example of using triggers for indexing:
> http://maxgrinev.com/2010/07/23/managing-indexes-in-cassandra-using-async-triggers/
> Implementation details are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira