You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/10/23 06:44:58 UTC

[jira] [Commented] (APEXMALHAR-2181) Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output Operator

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15599192#comment-15599192 ] 

ASF GitHub Bot commented on APEXMALHAR-2181:
--------------------------------------------

GitHub user ananthc opened a pull request:

    https://github.com/apache/apex-malhar/pull/466

    APEXMALHAR-2181 Added Cassandra Upsert Operator with PreparedStatemen…

    @PramodSSImmaneni / @ashwinchandrap / @sanjaypujare / @DT-Priyanka : Please review 
    
    @tweise  / @PramodSSImmaneni  - This pull request bumps the guava libraries to higher versions ( from 14.x to 16.x ).This was required to support new functionality in the Cassandra upsert operator (because cassandra driver needed an update ). 
    
    Brief description of the new functionalities being supported by this operator are here : https://issues.apache.org/jira/browse/APEXMALHAR-2181 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ananthc/apex-malhar APEXMALHAR-2181.CassandraUpsertOperator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/apex-malhar/pull/466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #466
    
----
commit 60eea9d7f5606a71f07114b323aa2dfbd89ffd22
Author: ananthc <an...@gmail.com>
Date:   2016-10-23T06:35:15Z

    APEXMALHAR-2181 Added Cassandra Upsert Operator with PreparedStatements and EXACTLY_ONCE semantics support

----


> Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output Operator
> ----------------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2181
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Ananth
>            Assignee: Ananth
>
>  An abstract operator that is used to mutate cassandra rows using PreparedStatements for faster executions
>   and accommodates EXACTLY_ONCE Semantics if concrete implementations choose to implement an abstract method with
>   meaningful implementation (as Cassandra is not a pure transactional database , the burden is on the concrete
>   implementation of the operator ONLY during the reconciliation window (and not for any other windows).
>  ===========================================================
>   The typical execution flow is as follows :
>    1. Create a concrete implementation of this class by extending this class and implement a few methods.
>    2. Define the payload that is the POJO that represents a Cassandra Row is part of this execution context
>       {@link UpsertExecutionContext}. The payload is a template Parameter of this class
>    3. The Upstream operator that wants to write to Cassandra does the following
>        a. Create an instance of {@link UpsertExecutionContext}
>        b. Set the payload ( an instance of the POJO created as step two above )
>        c. Set additional execution context parameters like CollectionHandling style, List placement Styles
>           overriding TTLs, Update only if Primary keys exist and Consistency Levels etc.
>    4. The concrete implementation would then execute this context as a cassandra row mutation
>  ===========================================================
>   This operator supports the following features
>   1. Highly customizable Connection policies. This is achieved by specifying the ConnectionStateManager.
>      There are a good number of connection management aspects that can be
>      controlled via {@link ConnectionStateManager} like consistency, load balancing, connection retries,
>      table to use, keyspace to use etc. Please refer javadoc of {@link ConnectionStateManager}
>   2. Support for Collections : Map, List and Sets are supported
>      User Defined types as part of collections is also supported.
>   3. Support exists for both adding to an existing collection or removing entries from an existing collection.
>      The POJO field that represents a collection is used to represent the collection that is added or removed.
>      Thus this can be used to avoid a pattern of read and then write the final value into the cassandra column
>      which can be used for low latency / high write pattern applications as we can avoid a read in the process.
>   4. Supports List Placements : The execution context can be used to specify where the new incoming list
>      is to be added ( in case there is an existing list in the current column of the current row being mutated.
>      Supported options are APPEND or PREPEND to an existing list
>   5. Support for User Defined Types. A pojo can have fields that represent the Cassandra Columns that are custom
>      user defined types. Concrete implementations of the operator provide a mapping of the cassandra column name
>      to the TypeCodec that is to be used for that field inside cassandra. Please refer javadocs of
>      {@link this.getCodecsForUserDefinedTypes() } for more details
>   6. Support for custom mapping of POJO payload field names to that of cassandra columns. Practically speaking,
>      POJO field names might not always match with Cassandra Column names and hence this support. This will also avoid
>      writing a POJO just for the cassandra operator and thus an existing POJO can be passed around to this operator.
>      Please refer javadoc {@link this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
>   7. TTL support - A default TTL can be set for the Connection ( via {@link ConnectionStateManager} and then used
>      for all mutations. This TTL can further be overridden at a tuple execution level to accomodate use cases of
>      setting custom column expirations typically useful in wide row implementations.
>   8. Support for Counter Column tables. Counter tables are also supported with the values inside the incoming
>      POJO added/subtracted from the counter column accordingly. Please note that the value is not absolute set but
>      rather representing the value that needs to be added to or subtracted from the current counter.
>   9. Support for Composite Primary Keys is also supported. All the POJO fields that map to the composite
>      primary key are used to resolve the primary key in case of a Composite Primary key table
>   10. Support for conditional updates : This operator can be used as an Update Only operator as opposed to an
>       Upsert operator. i.e. Update only IF EXISTS . This is achieved by setting the appropriate boolean in the
>       {@link UpsertExecutionContext} tuple that is passed from the upstream operator.
>   11. Lenient mapping of POJO fields to Cassandra column names. By default the POJO field names are case insensitive
>       to cassandra column names. This can be further enhanced by over-riding mappings. Please refer feature 6 above.
>   12. Defaults can be overridden at at tuple execution level for TTL & Consistency Policies
>   13. Support for handling Nulls i.e. whether null values in the POJO are to be persisted as is or to be ignored so
>       that the application need not perform a read to populate a POJO field if it is not available in the context
>   14. A few autometrics are provided for monitoring the latency aspects of the cassandra cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)