You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by "zuston (via GitHub)" <gi...@apache.org> on 2023/02/10 08:13:41 UTC

[GitHub] [incubator-uniffle] zuston opened a new issue, #589: [Bug] Incorrect report of partition size to spark driver

zuston opened a new issue, #589:
URL: https://github.com/apache/incubator-uniffle/issues/589

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Describe the bug
   
   In shuffle writer side, the writer will report its partitions size to driver when invoking the `stop` method in `RssShuffleWriter`. But it looks writer will report the every partition compressed size, which will cause incorrect optimization by AQE.
   
   ### Affects Version(s)
   
   master
   
   ### Uniffle Server Log Output
   
   _No response_
   
   ### Uniffle Engine Log Output
   
   _No response_
   
   ### Uniffle Server Configurations
   
   _No response_
   
   ### Uniffle Engine Configurations
   
   _No response_
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #589: [Bug] Incorrect report of partition size to spark driver

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on issue #589:
URL: https://github.com/apache/incubator-uniffle/issues/589#issuecomment-1429040354

   > > I look at Spark code. Origin Spark also use compressed data length, too.
   > 
   > You are right. But I think this is wrong, and spark also has similar problem.
   
   We should keep same behaviour with Spark. If you think it's a bug, you should modify it in the Spark first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zuston commented on issue #589: [Bug] Incorrect report of partition size to spark driver

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on issue #589:
URL: https://github.com/apache/incubator-uniffle/issues/589#issuecomment-1429036989

   > I look at Spark code. Origin Spark also use compressed data length, too.
   
   You are right. But I think this is wrong, and spark also has similar problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #589: [Bug] Incorrect report of partition size to spark driver

Posted by "jerqi (via GitHub)" <gi...@apache.org>.
jerqi commented on issue #589:
URL: https://github.com/apache/incubator-uniffle/issues/589#issuecomment-1429028254

   I look at Spark code. Origin Spark also use compressed data length, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zuston commented on issue #589: [Bug] Incorrect report of partition size to spark driver

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on issue #589:
URL: https://github.com/apache/incubator-uniffle/issues/589#issuecomment-1437798518

   > Do you have a concrete case that AQE doesn't work as expected? For that case, is vanilla spark worked as expected?
   
   I haven't do some experiments to confirm this. Maybe dedicated test case is enough.
   
   > I believe you could file a jira to Spark and discuss the possibility of add an additional field of uncompressed size of shuffle data with the Spark community.
   
   Yes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on issue #589: [Bug] Incorrect report of partition size to spark driver

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on issue #589:
URL: https://github.com/apache/incubator-uniffle/issues/589#issuecomment-1437791484

   > But it looks writer will report the every partition compressed size, which will cause incorrect optimization by AQE.
   
   Do you have a concrete case that AQE doesn't work as expected?  For that case, is vanilla spark worked as expected?
   
   > You are right. But I think this is wrong, and spark also has similar problem.
   
   I believe you could file a jira to Spark and discuss the possibility of add an additional field of uncompressed size of shuffle data with the Spark community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zuston commented on issue #589: [Bug] Incorrect report of partition size to spark driver

Posted by "zuston (via GitHub)" <gi...@apache.org>.
zuston commented on issue #589:
URL: https://github.com/apache/incubator-uniffle/issues/589#issuecomment-1429020670

   PTAL @advancedxy @jerqi @xianjingfeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org