You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/08/04 02:42:40 UTC

[GitHub] [incubator-uniffle] xianjingfeng opened a new issue, #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

xianjingfeng opened a new issue, #124:
URL: https://github.com/apache/incubator-uniffle/issues/124

   1. If we set `spark.rss.data.replica.write=2` and `spark.rss.data.replica=3`,Data integrity cannot be guaranteed in any one shuffle server. right?
   2. But in method `org.apache.uniffle.storage.handler.impl.LocalFileQuorumClientReadHandler#readShuffleData`, it just read from one shuffle server


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204846277

   > I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks). Rss has provided some skipping mechanisms for localfile and hdfs. But I'am worry about memory data. @jerqi
   
   In my opinion, memory data should also have data skip ability, and our read memory process should be optimized.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi closed issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
jerqi closed issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks
URL: https://github.com/apache/incubator-uniffle/issues/124


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204722297

   > Which version did you use
   
   internal version 0.5.0-snapshot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204823533

   I know, but the application will fail


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204859512

   > This will change server's memory storage to add "index" like hdfs
   
   This problem will should discuss in another issue, we also should have a simple design doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] frankliee commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
frankliee commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204808637

   > > Do you set `spark.rss.data.replica.read=2`
   > 
   > Yes
   > 
   > > As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
   > 
   > But this step seems execute before `readShuffleData`
   
   The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched.
   In current implementation, the client will only fetch “the first available” server to avoid the read cost. 
   But when the data in this first server is damaged, the final check will report "read inconsistent".
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204852903

   Get


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] frankliee commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
frankliee commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204853549

   This will change server's memory storage to add "index" like hdfs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204830693

   > > > Do you set `spark.rss.data.replica.read=2`
   > > 
   > > 
   > > Yes
   > > > As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
   > > 
   > > 
   > > But this step seems execute before `readShuffleData`
   > 
   > The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched. In current implementation, the client will only fetch “the first available” server to avoid the read cost. But when the data in this first server is damaged, the final check will report "read inconsistent".
   
   I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] frankliee commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
frankliee commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204843405

   I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks).
   Rss has provided some skipping mechanisms for localfile and hdfs.
   But I'am worry about memory data. @jerqi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204835800

   > I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?
   
   I am trying to do this, and i think it needs to be fixed with #108 together


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
xianjingfeng commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204720579

   > Do you set `spark.rss.data.replica.read=2`
   
   Yes
   
   
   
   > As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
   
   But this step seems execute before `readShuffleData`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] frankliee commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
frankliee commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1204707010

   Do you set `rss.data.replica.read=2` ?  It ensures the bitmap metadata of blocks to be written to 2 servers.
   
   As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] jerqi commented on issue #124: [Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Posted by GitBox <gi...@apache.org>.
jerqi commented on issue #124:
URL: https://github.com/apache/incubator-uniffle/issues/124#issuecomment-1329349990

   closed by #276 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org