You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/10/21 02:40:32 UTC

[GitHub] [incubator-uniffle] leixm opened a new pull request, #275: [Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

leixm opened a new pull request, #275:
URL: https://github.com/apache/incubator-uniffle/pull/275

   ### What changes were proposed in this pull request?
   For issue#239, Fix inconsistent blocks when reading shuffle data.
   
   ### Why are the changes needed?
   This problem will cause reading shuffle data failed.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Already added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #275: [ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1286551558

   @zuston  Can you help review this pr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#discussion_r1001889587


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -218,6 +220,13 @@ private static List<ShuffleDataSegment> transIndexDataToSegments(byte[] indexDat
 
         bufferSegments.add(new BufferSegment(blockId, bufferOffset, length, uncompressLength, crc, taskAttemptId));
         bufferOffset += length;
+        totalLength += length;
+
+        // If ShuffleServer is flushing the file at this time, the length in the index file record may be greater

Review Comment:
   Make sense. This changelog looks OK for me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #275: [ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#discussion_r1001549008


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -218,6 +220,13 @@ private static List<ShuffleDataSegment> transIndexDataToSegments(byte[] indexDat
 
         bufferSegments.add(new BufferSegment(blockId, bufferOffset, length, uncompressLength, crc, taskAttemptId));
         bufferOffset += length;
+        totalLength += length;
+
+        // If ShuffleServer is flushing the file at this time, the length in the index file record may be greater

Review Comment:
   I think this problem only occur that the map tasks are all finished and the data stored in memory is flushed to localfile/HDFS, right? And in this time, the spark client read the redundant index data. Right? 
   
   Analyzed from this perspective, if u drop these redundant data, does it will cause the data missing problem due to in hdfs client buffer insteading of memory/HDFS. I think it wont. The flushing data only will be flushed to HDFS and then removed from memory. And the method of `dataWriter.close()` in `HdfsShuffleWriteHandler` will ensure the data flushed to HDFS.
   
   So this change is OK and wont cause data lost. But I have a question that why not calling the `dataWriter.flush` and `indexWriter.flush` when writing one block to solve this problem. Does this will make performance regession?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
zuston commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1288553236

   > We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?
   
   It's OK for me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1287030222

   Will it occur similar situation for local file storage type?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1291907796

   LGTM, wait for CI. Do you have another suggestion? @zuston 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1288349806

   The key question is whether it is possible to have more blocks in the index file than in the data file, which seems possible based on the code analysis, but I suggest to refer to the actual production environment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
jerqi commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1288558252

   > > We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?
   > 
   > It's OK for me.
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
zuston commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1288512640

   > Will it occur similar situation for local file storage type?
   
   I think it also will happen in localfile type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1288346586

   Our production environment does not use localfile, so I am not quite sure if the same problem would exist.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] codecov-commenter commented on pull request #275: [ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1286396142

   # [Codecov](https://codecov.io/gh/apache/incubator-uniffle/pull/275?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#275](https://codecov.io/gh/apache/incubator-uniffle/pull/275?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (be1385b) into [master](https://codecov.io/gh/apache/incubator-uniffle/commit/7a2f0ef9ebca81b9d8eb4b2e643fb19331d1e944?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7a2f0ef) will **decrease** coverage by `1.19%`.
   > The diff coverage is `58.82%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #275      +/-   ##
   ============================================
   - Coverage     59.71%   58.51%   -1.20%     
   + Complexity     1377     1301      -76     
   ============================================
     Files           166      157       -9     
     Lines          8918     8432     -486     
     Branches        853      824      -29     
   ============================================
   - Hits           5325     4934     -391     
   + Misses         3318     3232      -86     
   + Partials        275      266       -9     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-uniffle/pull/275?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...orage/handler/impl/LocalFileClientReadHandler.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9Mb2NhbEZpbGVDbGllbnRSZWFkSGFuZGxlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...e/storage/handler/impl/HdfsShuffleReadHandler.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9IZGZzU2h1ZmZsZVJlYWRIYW5kbGVyLmphdmE=) | `55.00% <25.00%> (-2.15%)` | :arrow_down: |
   | [.../java/org/apache/uniffle/common/util/RssUtils.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS91bmlmZmxlL2NvbW1vbi91dGlsL1Jzc1V0aWxzLmphdmE=) | `68.86% <60.00%> (-0.47%)` | :arrow_down: |
   | [...e/uniffle/storage/handler/impl/HdfsFileReader.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9IZGZzRmlsZVJlYWRlci5qYXZh) | `81.81% <80.00%> (-0.95%)` | :arrow_down: |
   | [...storage/handler/impl/DataSkippableReadHandler.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3RvcmFnZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvdW5pZmZsZS9zdG9yYWdlL2hhbmRsZXIvaW1wbC9EYXRhU2tpcHBhYmxlUmVhZEhhbmRsZXIuamF2YQ==) | `81.81% <100.00%> (-2.56%)` | :arrow_down: |
   | [...ava/org/apache/spark/shuffle/RssShuffleHandle.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9Sc3NTaHVmZmxlSGFuZGxlLmphdmE=) | | |
   | [...e/spark/shuffle/reader/RssShuffleDataIterator.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS9yZWFkZXIvUnNzU2h1ZmZsZURhdGFJdGVyYXRvci5qYXZh) | | |
   | [...k/shuffle/writer/WrappedByteArrayOutputStream.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvV3JhcHBlZEJ5dGVBcnJheU91dHB1dFN0cmVhbS5qYXZh) | | |
   | [...che/spark/shuffle/writer/BufferManagerOptions.java](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Y2xpZW50LXNwYXJrL2NvbW1vbi9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvc3Bhcmsvc2h1ZmZsZS93cml0ZXIvQnVmZmVyTWFuYWdlck9wdGlvbnMuamF2YQ==) | | |
   | ... and [5 more](https://codecov.io/gh/apache/incubator-uniffle/pull/275/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] zuston commented on a diff in pull request #275: [ISSUE-239][Problem] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
zuston commented on code in PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#discussion_r1001549008


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -218,6 +220,13 @@ private static List<ShuffleDataSegment> transIndexDataToSegments(byte[] indexDat
 
         bufferSegments.add(new BufferSegment(blockId, bufferOffset, length, uncompressLength, crc, taskAttemptId));
         bufferOffset += length;
+        totalLength += length;
+
+        // If ShuffleServer is flushing the file at this time, the length in the index file record may be greater

Review Comment:
   I think this problem only occur that the map tasks are all finished and the data stored in memory is flushed to localfile/HDFS, right? And in this time, the spark client read the redundant index data. Right? 
   
   Analyzed from this perspective, if u drop these redundant data, does it will cause the data missing problem due to in hdfs client buffer insteading of memory/HDFS. I think it wont. The flushing data only will be flushed to HDFS and then removed from memory. And the method of `dataWriter.close()` in `HdfsShuffleWriteHandler` will ensure the data flushed to HDFS.
   
   So this change is OK and wont cause data lose. But I have a question that why not calling the `dataWriter.flush` and `indexWriter.flush` when writing one block to solve this problem. Does this will make performance regession?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on a diff in pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
leixm commented on code in PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#discussion_r1001794946


##########
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java:
##########
@@ -218,6 +220,13 @@ private static List<ShuffleDataSegment> transIndexDataToSegments(byte[] indexDat
 
         bufferSegments.add(new BufferSegment(blockId, bufferOffset, length, uncompressLength, crc, taskAttemptId));
         bufferOffset += length;
+        totalLength += length;
+
+        // If ShuffleServer is flushing the file at this time, the length in the index file record may be greater

Review Comment:
   It seems unreasonable to do a flush every time a block is written, so that the buffer of the hdfs client will not work, and it will make performance regession.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] leixm commented on pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
leixm commented on PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#issuecomment-1288541179

   > > Will it occur similar situation for local file storage type?
   > 
   > I think it also will happen in localfile type.
   
   We can consider the length of the data file when ShuffleServer#getLocalShuffleIndex generates index information, what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi commented on a diff in pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
jerqi commented on code in PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275#discussion_r1005301935


##########
storage/src/main/java/org/apache/uniffle/storage/handler/impl/HdfsFileReader.java:
##########
@@ -92,4 +94,11 @@ public synchronized void close() throws IOException {
   public Path getPath() {
     return path;
   }
+
+  public long getFileLen() throws IOException {
+    if (fileLen == -1) {

Review Comment:
   We could call `getFileLen` multiple times. If the data is flushed after we read the data at the first time. The length should vary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org


[GitHub] [incubator-uniffle] jerqi merged pull request #275: [ISSUE-239][BUG] RssUtils#transIndexDataToSegments should consider the length of the data file

Posted by GitBox <gi...@apache.org>.
jerqi merged PR #275:
URL: https://github.com/apache/incubator-uniffle/pull/275


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org