You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/03 14:40:01 UTC

[GitHub] [airflow] akocukcu commented on issue #10992: Transfer table/data from mssql to mssql

akocukcu commented on issue #10992:
URL: https://github.com/apache/airflow/issues/10992#issuecomment-721156021


   @shizidushu  I have been trying to create a proper sollution for this issue with my newborn :D (thats why it took so long).
   Solution: The equvialent of `"REPLACE INTO"` or `"INSERT ... ON CONFLICT DO UPDATE"` in Sql Server is built-in [MERGE](https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql) operation. 
   I can provide that by adding `_generate_insert_sql` method into `MsSqlHook`. After that it will be available for upserting **("small")** data.
   
   But in the description it is said that, this will be used for incremental load to datawarehouse. According to my experience, using MERGE directly on datawarehouse table may produce unwanted results (in terms of concurrency and performance). I have just written a [blog post](https://akocukcu.github.io/incremental-etl-pipeline-airflow.html) about this topic.
   https://akocukcu.github.io/incremental-etl-pipeline-airflow.html
   In a nutshell, we should know that it has specific drawbacks in wide usage.
   
   My suggestion for incremental etl is to use `GenericTransfer` combined with Staging table and additional MsSqlHooks for executing procedures.
   
   At last, I can add upsert functionality by using Merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org