You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "CheneyYin (via GitHub)" <gi...@apache.org> on 2023/04/05 09:59:46 UTC

[GitHub] [incubator-seatunnel] CheneyYin opened a new issue, #4502: [Improve][Core/Spark-Starter] Push transform operation from Spark Driver to Executors

CheneyYin opened a new issue, #4502:
URL: https://github.com/apache/incubator-seatunnel/issues/4502

   ### Search before asking
   
   - [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.
   
   
   ### Description
   
   # Present situation
   In `org.apache.seatunnel.core.starter.spark.execution.TransformExecuteProcessor#sparkTransform`, all data stored in executors will be transmitted to spark driver, because `Dataset<Row>.toLocalIterator` function be invoked.  And all rows will be added in list(Pure Memory).
   The implementation of `sparkTransform` will have the following negative effects:
   - It will transfer redundant network data.
   - It is prone to OOM failures on the spark driver.
   - It causes parallel computing to degenerate into serial computing.
   
   # Improvement plan
   - Replace `Dataset<Row>.toLocalIterator` with `Dataset<Row>.mapPartitions`.
   - To implement a `Iterator`, which supports lazy compute and never load all data into memory.
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] CheneyYin closed issue #4502: [Improve][Core/Spark-Starter] Push transform operation from Spark Driver to Executors

Posted by "CheneyYin (via GitHub)" <gi...@apache.org>.
CheneyYin closed issue #4502: [Improve][Core/Spark-Starter] Push transform operation from Spark Driver to Executors
URL: https://github.com/apache/incubator-seatunnel/issues/4502


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org