You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/11/27 19:55:06 UTC

[GitHub] [incubator-hudi] pushpavanthar commented on issue #969: [HUDI-251] JDBC incremental load to HUDI DeltaStreamer

pushpavanthar commented on issue #969: [HUDI-251] JDBC incremental load to HUDI DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/969#issuecomment-559230742
 
 
   I would like to add 2 points to this feature to make this very generic
   
   - [ ] We might need support for combination of more than one incrementing columns. Incrementing columns can be of below types 
   1. Timestamp column
   2. Auto Incrementing column
   3. Timestamp + Auto Incrementing.
   Instead of code figuring out the incremental pull strategy, it'll be better if user provide it from config for each table.
   When accepting Timestamp incrementing column, there can be more than once columns contributing to this strategy. e.g. During a row is creation only `created_at` column is set and let's say `updated_at` is null by default. When the same row is updated, `updated_at` gets assigned to some timestamp. In such scenarios its wise to consider both columns in your query formation. 
   
   - [ ] We need to sort rows according to above mentioned incrementing columns to fetch rows in chunks (you can make use of `defaultFetchSize` for MySQL). I understand this adds load on Database, but this tracks the last pulled timestamp or auto incrementing column and helps retry from that point for consecutive batches. This will be a saviour during failures. 
   
   A sample MySQL query for incrementing timestamp columns as (`created_at` and `updated_at`)  might look like 
   `SELECT * FROM inventory.customers WHERE COALESCE(inventory.customers.updated_at, inventory.customers.created_at) > $last_recorder_time AND COALESCE(inventory.customers.updated_at,inventory.customers.created_at) < $current_time ORDER BY COALESCE(inventory.customers.updated_at,inventory.customers.created_at) ASC`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services