You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/11/22 05:18:59 UTC

[jira] [Commented] (APEXMALHAR-2181) Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output Operator

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685731#comment-15685731 ] 

ASF GitHub Bot commented on APEXMALHAR-2181:
--------------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/apex-malhar/pull/466


> Non-Transactional Prepared Statement Based Cassandra Upsert (Update + Insert ) output Operator
> ----------------------------------------------------------------------------------------------
>
>                 Key: APEXMALHAR-2181
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2181
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Ananth
>            Assignee: Ananth
>
>  An abstract operator that is used to mutate cassandra rows using PreparedStatements for faster executions
>   and accommodates EXACTLY_ONCE Semantics if concrete implementations choose to implement an abstract method with
>   meaningful implementation (as Cassandra is not a pure transactional database , the burden is on the concrete
>   implementation of the operator ONLY during the reconciliation window (and not for any other windows).
>  ===========================================================
>   The typical execution flow is as follows :
>    1. Create a concrete implementation of this class by extending this class and implement a few methods.
>    2. Define the payload that is the POJO that represents a Cassandra Row is part of this execution context
>       {@link UpsertExecutionContext}. The payload is a template Parameter of this class
>    3. The Upstream operator that wants to write to Cassandra does the following
>        a. Create an instance of {@link UpsertExecutionContext}
>        b. Set the payload ( an instance of the POJO created as step two above )
>        c. Set additional execution context parameters like CollectionHandling style, List placement Styles
>           overriding TTLs, Update only if Primary keys exist and Consistency Levels etc.
>    4. The concrete implementation would then execute this context as a cassandra row mutation
>  ===========================================================
>   This operator supports the following features
>   1. Highly customizable Connection policies. This is achieved by specifying the ConnectionStateManager.
>      There are a good number of connection management aspects that can be
>      controlled via {@link ConnectionStateManager} like consistency, load balancing, connection retries,
>      table to use, keyspace to use etc. Please refer javadoc of {@link ConnectionStateManager}
>   2. Support for Collections : Map, List and Sets are supported
>      User Defined types as part of collections is also supported.
>   3. Support exists for both adding to an existing collection or removing entries from an existing collection.
>      The POJO field that represents a collection is used to represent the collection that is added or removed.
>      Thus this can be used to avoid a pattern of read and then write the final value into the cassandra column
>      which can be used for low latency / high write pattern applications as we can avoid a read in the process.
>   4. Supports List Placements : The execution context can be used to specify where the new incoming list
>      is to be added ( in case there is an existing list in the current column of the current row being mutated.
>      Supported options are APPEND or PREPEND to an existing list
>   5. Support for User Defined Types. A pojo can have fields that represent the Cassandra Columns that are custom
>      user defined types. Concrete implementations of the operator provide a mapping of the cassandra column name
>      to the TypeCodec that is to be used for that field inside cassandra. Please refer javadocs of
>      {@link this.getCodecsForUserDefinedTypes() } for more details
>   6. Support for custom mapping of POJO payload field names to that of cassandra columns. Practically speaking,
>      POJO field names might not always match with Cassandra Column names and hence this support. This will also avoid
>      writing a POJO just for the cassandra operator and thus an existing POJO can be passed around to this operator.
>      Please refer javadoc {@link this.getPojoFieldNameToCassandraColumnNameOverride()} for an example
>   7. TTL support - A default TTL can be set for the Connection ( via {@link ConnectionStateManager} and then used
>      for all mutations. This TTL can further be overridden at a tuple execution level to accomodate use cases of
>      setting custom column expirations typically useful in wide row implementations.
>   8. Support for Counter Column tables. Counter tables are also supported with the values inside the incoming
>      POJO added/subtracted from the counter column accordingly. Please note that the value is not absolute set but
>      rather representing the value that needs to be added to or subtracted from the current counter.
>   9. Support for Composite Primary Keys is also supported. All the POJO fields that map to the composite
>      primary key are used to resolve the primary key in case of a Composite Primary key table
>   10. Support for conditional updates : This operator can be used as an Update Only operator as opposed to an
>       Upsert operator. i.e. Update only IF EXISTS . This is achieved by setting the appropriate boolean in the
>       {@link UpsertExecutionContext} tuple that is passed from the upstream operator.
>   11. Lenient mapping of POJO fields to Cassandra column names. By default the POJO field names are case insensitive
>       to cassandra column names. This can be further enhanced by over-riding mappings. Please refer feature 6 above.
>   12. Defaults can be overridden at at tuple execution level for TTL & Consistency Policies
>   13. Support for handling Nulls i.e. whether null values in the POJO are to be persisted as is or to be ignored so
>       that the application need not perform a read to populate a POJO field if it is not available in the context
>   14. A few autometrics are provided for monitoring the latency aspects of the cassandra cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)