You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/06/01 08:30:00 UTC

[GitHub] [incubator-seatunnel] leo65535 opened a new issue, #1981: [Feature][sql] Data Transmission based on SQL

leo65535 opened a new issue, #1981:
URL: https://github.com/apache/incubator-seatunnel/issues/1981

   ### Search before asking
   
   - [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.
   
   
   ### Description
   
   
   We know that there are many data transmission products, like Apache Flume, Apache Sqoop,
   Alibaba Datax, DTStack flinkx etc, we can see that more and more products support creating 
   data transmission task through SQL configuration. So I wana to raise a topic that let 
   SeaTunnel focus on SQL, we can get a lot of benefits from it, and this will be more in line 
   with the goals of the project `Next-generation high-performance, distributed, massive data integration framework`.
   
   The SQL is a language-integrated query that allows the composition of queries from relational 
   operators such as selection, filter, and join in a very intuitive way. We can use catalog management
   to manage these sqls, and not to maintain the api configuration.
   
   So, suggest that we can create a new branch which foucus on SQL like api-draft branch, many features need 
   to develop quickly, like cdc, breakpoint continuation, metrics, catalog management, web ui and etc. The
   goal of the branch is `Data Transmission based on SQL`.
   
   [Uploading SeaTunnel 规划思考.pptx…]()
   
   
   
   ### Usage Scenario
   
   -
   
   ### Related issues
   
   -
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] leo65535 closed issue #1981: [Feature][sql] Data Transmission based on SQL

Posted by GitBox <gi...@apache.org>.

leo65535 closed issue #1981: [Feature][sql] Data Transmission based on SQL
URL: https://github.com/apache/incubator-seatunnel/issues/1981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] leo65535 commented on issue #1981: [Feature][sql] Data Transmission based on SQL

Posted by GitBox <gi...@apache.org>.

leo65535 commented on issue #1981:
URL: https://github.com/apache/incubator-seatunnel/issues/1981#issuecomment-1144599100

   @William-GuoWei Thanks for sharing your thinking. This proposal is still in the initial stage, we can design the SeaTunnel SQL which can be adapted to FlinkSQL or SparkSQL, I think it's not important for now from my side. It seems that we had spent a lot of time to adapt multipe engines, but the movement was very slow and tortuous, so it's better to support single engine currently. For https://www.getdbt.com/ we can add more connectors to support AWS, Google cloud and etc.
   
   Here, I drew a sketch of the architecture, seems that we should spend more time to design the whole system, not focus on mulipe engines.
   
   ![image](https://user-images.githubusercontent.com/95013770/171588593-2a8b20dd-6969-4b3d-bf79-dd2c1608d5f2.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] William-GuoWei commented on issue #1981: [Feature][sql] Data Transmission based on SQL

Posted by GitBox <gi...@apache.org>.

William-GuoWei commented on issue #1981:
URL: https://github.com/apache/incubator-seatunnel/issues/1981#issuecomment-1143334139

   It is great idea to do the SQL-like transformation. But I did't see the idea about how to implement it both on Spark, Flink, DataFusion,etc. I can only saw the way implemented on Flink. As far as I know, FlinkSQL did it very well. And FlinkSQL API is quite different with SparkSQL. I don't know how to deal with it if we only design the architect base on Flink and FlinkSQL. Perhaps Universal SQL API is needed before implement SQL on Flink, and of course some customer API can be used in some way.
   About this idea, you can see https://www.getdbt.com/ about how universal SQL support AWS, Google cloud and etc. I hope that can help you in some way.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org