You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/06/01 08:30:00 UTC
[GitHub] [incubator-seatunnel] leo65535 opened a new issue, #1981: [Feature][sql] Data Transmission based on SQL
leo65535 opened a new issue, #1981:
URL: https://github.com/apache/incubator-seatunnel/issues/1981
### Search before asking
- [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.
### Description
We know that there are many data transmission products, like Apache Flume, Apache Sqoop,
Alibaba Datax, DTStack flinkx etc, we can see that more and more products support creating
data transmission task through SQL configuration. So I wana to raise a topic that let
SeaTunnel focus on SQL, we can get a lot of benefits from it, and this will be more in line
with the goals of the project `Next-generation high-performance, distributed, massive data integration framework`.
The SQL is a language-integrated query that allows the composition of queries from relational
operators such as selection, filter, and join in a very intuitive way. We can use catalog management
to manage these sqls, and not to maintain the api configuration.
So, suggest that we can create a new branch which foucus on SQL like api-draft branch, many features need
to develop quickly, like cdc, breakpoint continuation, metrics, catalog management, web ui and etc. The
goal of the branch is `Data Transmission based on SQL`.
[Uploading SeaTunnel 规划思考.pptx…]()
### Usage Scenario
-
### Related issues
-
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] leo65535 closed issue #1981: [Feature][sql] Data Transmission based on SQL
Posted by GitBox <gi...@apache.org>.
leo65535 closed issue #1981: [Feature][sql] Data Transmission based on SQL
URL: https://github.com/apache/incubator-seatunnel/issues/1981
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] leo65535 commented on issue #1981: [Feature][sql] Data Transmission based on SQL
Posted by GitBox <gi...@apache.org>.
leo65535 commented on issue #1981:
URL: https://github.com/apache/incubator-seatunnel/issues/1981#issuecomment-1144599100
@William-GuoWei Thanks for sharing your thinking. This proposal is still in the initial stage, we can design the SeaTunnel SQL which can be adapted to FlinkSQL or SparkSQL, I think it's not important for now from my side. It seems that we had spent a lot of time to adapt multipe engines, but the movement was very slow and tortuous, so it's better to support single engine currently. For https://www.getdbt.com/ we can add more connectors to support AWS, Google cloud and etc.
Here, I drew a sketch of the architecture, seems that we should spend more time to design the whole system, not focus on mulipe engines.
![image](https://user-images.githubusercontent.com/95013770/171588593-2a8b20dd-6969-4b3d-bf79-dd2c1608d5f2.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] William-GuoWei commented on issue #1981: [Feature][sql] Data Transmission based on SQL
Posted by GitBox <gi...@apache.org>.
William-GuoWei commented on issue #1981:
URL: https://github.com/apache/incubator-seatunnel/issues/1981#issuecomment-1143334139
It is great idea to do the SQL-like transformation. But I did't see the idea about how to implement it both on Spark, Flink, DataFusion,etc. I can only saw the way implemented on Flink. As far as I know, FlinkSQL did it very well. And FlinkSQL API is quite different with SparkSQL. I don't know how to deal with it if we only design the architect base on Flink and FlinkSQL. Perhaps Universal SQL API is needed before implement SQL on Flink, and of course some customer API can be used in some way.
About this idea, you can see https://www.getdbt.com/ about how universal SQL support AWS, Google cloud and etc. I hope that can help you in some way.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org