You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "CheneyYin (via GitHub)" <gi...@apache.org> on 2023/04/05 09:59:46 UTC
[GitHub] [incubator-seatunnel] CheneyYin opened a new issue, #4502: [Improve][Core/Spark-Starter] Push transform operation from Spark Driver to Executors
CheneyYin opened a new issue, #4502:
URL: https://github.com/apache/incubator-seatunnel/issues/4502
### Search before asking
- [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.
### Description
# Present situation
In `org.apache.seatunnel.core.starter.spark.execution.TransformExecuteProcessor#sparkTransform`, all data stored in executors will be transmitted to spark driver, because `Dataset<Row>.toLocalIterator` function be invoked. And all rows will be added in list(Pure Memory).
The implementation of `sparkTransform` will have the following negative effects:
- It will transfer redundant network data.
- It is prone to OOM failures on the spark driver.
- It causes parallel computing to degenerate into serial computing.
# Improvement plan
- Replace `Dataset<Row>.toLocalIterator` with `Dataset<Row>.mapPartitions`.
- To implement a `Iterator`, which supports lazy compute and never load all data into memory.
### Usage Scenario
_No response_
### Related issues
_No response_
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] CheneyYin closed issue #4502: [Improve][Core/Spark-Starter] Push transform operation from Spark Driver to Executors
Posted by "CheneyYin (via GitHub)" <gi...@apache.org>.
CheneyYin closed issue #4502: [Improve][Core/Spark-Starter] Push transform operation from Spark Driver to Executors
URL: https://github.com/apache/incubator-seatunnel/issues/4502
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org