You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Priyanka Gugale <pr...@apache.org> on 2016/08/05 10:47:16 UTC

Update to AbstractJDBCPollInputOperator

Hi,

The poll operator in repository has n-1 non-polling partition to read DB in
parallel fashion whereas the last partition keeps polling DB to fetch newly
added records.

*For non-polling Partition,*
To assign range of rows to read to a partition we first fire a offset,
limit query to fetch key column value and then fire a "between" query.
Instead I suggest we should use offset, limit query directly to fetch
records in that non-polling partition. The offset, limit query has some
performance hit but we anyway need to run the query once to get key column
value so having "between" with "offset,limit" won't improve performance.
Also as partition is non-polling this is one time overhead.
*The limitation is* we assume there won't be any out of order
insertions/deletions i.e. records wont' be inserted or deleted at random
offsets.

For polling partition,
As polling partition fires the query at each polling interval offset,limit
query would be a performance problem, for this partition we can do a ">"
query as it's already done by operator.

I am going ahead with this implementation unless anyone has suggestions.

-Priyanka