You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "devendra tagare (JIRA)" <ji...@apache.org> on 2016/07/14 18:36:20 UTC

[jira] [Updated] (APEXMALHAR-2066) Add jdbc poller input operator

     [ https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

devendra tagare updated APEXMALHAR-2066:
----------------------------------------
    Description: 
Create a JDBC poller input operator that has the following features.

1. poll from external jdbc store asynchronously in the input operator.
2. polling frequency and batch size should be configurable.
3. should be idempotent.
4. should be partition-able.
5. should be batch + polling capable.


Assumptions for idempotency & partitioning,
1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key.
2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given.
3.This operator uses static partitioning to arrive at range queries for exactly once reads.
This operator will create a configured number of non-polling static partitions for fetching the existing data in the table. And an additional
single partition for polling additive data.
4.Assumption is that there is an ordered column using which range queries can be formed.
The *key* column, based on which the polling will happen, is any column which has ever increasing values and supports greater than and less
than operations in SQL. 
5.If an emitColumnList is provided, please ensure that the keyColumn is the first column in the list
6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list of the emit columns eg columnA,columnB,columnC
7. Only newly added data which has increasing ids will be fetched by the
   polling jdbc partition

Per window the first and the last key processed is saved using the FSWindowDataManager - (<lowerBound,UpperBound>,operatorId,windowId).This (lowerBound,upperBoundPair) is then used for recovery.The queries are constructed using the JDBCMetaDataUtility.

JDBCMetaDataUtility
A utility class used to retrieve the metadata for a given unique key of a SQL table. This class would emit range queries based on a primary index given.



  was:
Create a JDBC poller input operator that has the following features.

1. poll from external jdbc store asynchronously in the input operator.
2. polling frequency and batch size should be configurable.
3. should be idempotent.
4. should be partition-able.
5. should be batch + polling capable.


Assumptions for idempotency & partitioning,
1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key.
2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given.
3.This operator uses static partitioning to arrive at range queries for exactly once reads
4.Assumption is that there is an ordered column using which range queries can be formed<br>
5.If an emitColumnList is provided, please ensure that the keyColumn is the first column in the list
6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list of the emit columns eg columnA,columnB,columnC

Per window the first and the last key processed is saved using the FSWindowDataManager - (<lowerBound,UpperBound>,operatorId,windowId).This (lowerBound,upperBoundPair) is then used for recovery.The queries are constructed using the JDBCMetaDataUtility.

JDBCMetaDataUtility
A utility class used to retrieve the metadata for a given unique key of a SQL table. This class would emit range queries based on a primary index given.




> Add jdbc poller input operator
> ------------------------------
>
>                 Key: APEXMALHAR-2066
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2066
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Ashwin Chandra Putta
>            Assignee: devendra tagare
>
> Create a JDBC poller input operator that has the following features.
> 1. poll from external jdbc store asynchronously in the input operator.
> 2. polling frequency and batch size should be configurable.
> 3. should be idempotent.
> 4. should be partition-able.
> 5. should be batch + polling capable.
> Assumptions for idempotency & partitioning,
> 1.User needs to provide tableName,dbConnection,setEmitColumnList,look-up key.
> 2.Optionally batchSize,pollInterval,Look-up key and a where clause can be given.
> 3.This operator uses static partitioning to arrive at range queries for exactly once reads.
> This operator will create a configured number of non-polling static partitions for fetching the existing data in the table. And an additional
> single partition for polling additive data.
> 4.Assumption is that there is an ordered column using which range queries can be formed.
> The *key* column, based on which the polling will happen, is any column which has ever increasing values and supports greater than and less
> than operations in SQL. 
> 5.If an emitColumnList is provided, please ensure that the keyColumn is the first column in the list
> 6.Range queries are formed using the JdbcMetaDataUtility Output - comma separated list of the emit columns eg columnA,columnB,columnC
> 7. Only newly added data which has increasing ids will be fetched by the
>    polling jdbc partition
> Per window the first and the last key processed is saved using the FSWindowDataManager - (<lowerBound,UpperBound>,operatorId,windowId).This (lowerBound,upperBoundPair) is then used for recovery.The queries are constructed using the JDBCMetaDataUtility.
> JDBCMetaDataUtility
> A utility class used to retrieve the metadata for a given unique key of a SQL table. This class would emit range queries based on a primary index given.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)