You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/09/23 03:24:22 UTC

[GitHub] [incubator-uniffle] zuston commented on issue #239: [Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

zuston commented on issue #239:
URL: https://github.com/apache/incubator-uniffle/issues/239#issuecomment-1255762455

> If the data is currently being flushed, we cannot guarantee that the number of blocks in the index is the same as the number of blocks in the data.

Yes, you are right. Please refer to #204 . but this PR wont solve your problems you mentioned, it just make fail fast and log some exception for analysis.

Let revisit this problem, as I know, the reading sequence of client will from memory -> localfile to hdfs, that means the incomplete data reading is not affect the result.

For example, the partial memory shuffle data is being flushed to HDFS or in the flushing queue, it also will get from the read client side. Although the index file in HDFS is incomplete, the partial data has been accepted from memory. So this is not a problem.

So the problems you mentioned make me confused, there should be no problem with the design of reading, there may be some bugs.

By the way, I have also encountered inconsistent block problems, but we are using the memory_localfile mode, which is caused by the instability of grpc service, refer to #198

Feel free to discuss more.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org