You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by "jerqi (via GitHub)" <gi...@apache.org> on 2023/03/17 04:37:34 UTC

[GitHub] [incubator-uniffle] jerqi opened a new issue, #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

jerqi opened a new issue, #736:
URL: https://github.com/apache/incubator-uniffle/issues/736

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What would you like to be improved?
   
   This parameter is configured by users currently. 
   But it should be calculated by the (partition receive data speed / HDFS write speed). It's more flexible.
   
   ### How should we improve?
   
   Introduce a new configuration `rss.server.hdfs.speed`. We record the timestamp which the partition start to receive data. And then we record the timestamp which the partition start to flush data. We can use two timestamps to calcuate partition receive data speed.  We use `(partition receive data speed / HDFS write speed)` to calculate how many threads which we need to write HDFS data.  We use the number to create HDFS writer.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1501554219

   > I want to implement the max concurrency specified by client. WDYT ? @jerqi
   
   I'm ok for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zuston commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1473303651

   > This parameter is configured by users currently.
   
   Correct: this param is configured by server side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1473388812

   > > Introduce a new configuration rss.server.hdfs.speed. We record the timestamp which the partition start to receive data. And then we record the timestamp which the partition start to flush data. We can use two timestamps to calcuate partition receive data speed. We use (partition receive data speed / HDFS write speed) to calculate how many threads which we need to write HDFS data. We use the number to create HDFS writer.
   > 
   > It looks too complex, the initial goal is to solve the single partition lock problem. Does above solution will solve this?
   
   Yes, we don't need to adjust the parameter by hand.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zuston commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1501553389

   I want to implement the max concurrency specified by client. WDYT ? @jerqi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi closed issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi closed issue #736: [Improvement] More smart parameters  about `rss.server.max.concurrency.of.single.partition.write`
URL: https://github.com/apache/incubator-uniffle/issues/736


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zuston commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1475522464

   Add some detailed information about feature of concurrent writing single partition.
   1. The reason of this feature is to solve the write lock of writing same partition data.
   2. this feature is no cost, which won't use the thread pool to do this, only leveraging the flushing thread pool in `ShuffleFlushManager`, refer to `PooledHdfsShuffleWriteHandler`
   
   Based on above description, I think we could do some improvements for this.
   1. Make client configuring this to control the concurrency of writing single partition
   2. Best effort to reduce file number in case of no race condition, which is important for HDFS performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zuston commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1473305435

   > Introduce a new configuration rss.server.hdfs.speed. We record the timestamp which the partition start to receive data. And then we record the timestamp which the partition start to flush data. We can use two timestamps to calcuate partition receive data speed. We use (partition receive data speed / HDFS write speed) to calculate how many threads which we need to write HDFS data. We use the number to create HDFS writer.
   
   It looks too complex, the initial goal is to solve the single partition lock problem. Does above solution will solve this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1473407937

   > > This parameter is configured by users currently.
   > 
   > Correct: this param is configured by server side.
   
   When you need to modify the parameter, it's inconvenient to modify this parameter. It's difficult to judge whether the number of threads is enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #736: [Improvement] More smart parameters about `rss.server.max.concurrency.of.single.partition.write`

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on issue #736:
URL: https://github.com/apache/incubator-uniffle/issues/736#issuecomment-1486116919

   closed by #744 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org