You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/09/30 20:57:29 UTC

[GitHub] [orc] omalley opened a new pull request #924: ORC-1014. Add details when we get IOException from the file system

omalley opened a new pull request #924:
URL: https://github.com/apache/orc/pull/924


   ### What changes were proposed in this pull request?
   
   It catches and wraps IOException in the data reader to give information about the range of bytes that the reader was trying to read.
   
   ### Why are the changes needed?
   
   To help debugging.
   
   ### How was this patch tested?
   
   By reading files in S3 with a version of Hadoop that had HADOOP-16109.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #924:
URL: https://github.com/apache/orc/pull/924


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r725661709



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       https://github.com/apache/orc/blob/93b7aa67830104d6bd7fc55399947ee938549f55/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L99-L104
   Hey guys @omalley @dongjoon-hyun @autumnust 
   It looks better to refer to the `readFileData`  on the exception catch, because here we can get the path variable directly (file and path are both member var). 
   
   Call chain
   `readFileData`---->`readDiskRanges`----->`readRanges`
   `readFileData`---->`readDiskRanges`----->`zeroCopyReadRanges`
   
   And the parallel `zeroCopyReadRanges` method allows us to catch exceptions and print information about them in the same place.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] autumnust commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r719813922



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       is `file.toString` only giving some object reference? I am assuming you are trying to print the path instead




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r725661709



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       https://github.com/apache/orc/blob/93b7aa67830104d6bd7fc55399947ee938549f55/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L99-L104
   Hey guys @omalley @dongjoon-hyun @autumnust 
   It looks better to refer to the `readFileData`  on the exception catch, because here we can get the path variable directly (file and path are both member var).
   
   Call chain
   `readFileData`---->`readDiskRanges`----->`readRanges`
   `readFileData`---->`readDiskRanges`----->`zeroCopyReadRanges`
   
   And the parallel `zeroCopyReadRanges` method allows us to catch exceptions and print information about them in the same place.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r736223903



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       Thank you for the snippet, @pavibhai .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r725661709



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       https://github.com/apache/orc/blob/93b7aa67830104d6bd7fc55399947ee938549f55/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L99-L104
   Hey guys @omalley @dongjoon-hyun @autumnust 
   It looks better to refer to the `readFileData`  on the exception catch, because here we can get the path variable directly
   
   Call chain
   `readFileData`---->`readDiskRanges`----->`readRanges`
   `readFileData`---->`readDiskRanges`----->`zeroCopyReadRanges`
   
   And the parallel `zeroCopyReadRanges` method allows us to catch exceptions and print information about them.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] omalley commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
omalley commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r721574996



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       Argh, yeah, I can see that it doesn't.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pavibhai commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
pavibhai commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r733860985



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       IMHO it feels like we should fix the FSInputStream to include the relevant information including the path. The S3AInputStream is very descriptive 
   
   > "Caused by: java.io.IOException: Failed while reading org.apache.hadoop.fs.FSDataInputStream@2f1173dd: S3AInputStream{s3a://<SNIP>.orc wrappedStream=open read policy=random pos=65539 nextReadPos=0 contentLength=121693 contentRangeStart=3 contentRangeFinish=65539 remainingInCurrentRequest=0
   StreamStatistics{OpenOperations=5, CloseOperations=4, Closed=4, Aborted=0, SeekOperations=7, ReadExceptions=0, ForwardSeekOperations=3, BackwardSeekOperations=4, BytesSkippedOnSeek=148, BytesBackwardsOnSeek=146391, BytesRead=106621, BytesRead excluding skipped=106473, ReadOperations=15, ReadFullyOperations=8, ReadsIncomplete=8, BytesReadInClose=64722, BytesDiscardedInAbort=0, InputPolicy=2, InputPolicySetCount=2}}"
   
   I would propose that:
   * We have the exception enhanced with the InputStream.toString
   * We further enhance the exception as suggested by @guiyanakuang to include the path reference in readFileData




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #924:
URL: https://github.com/apache/orc/pull/924#issuecomment-948766538


   Gentle ping, @omalley . How do you want to proceed this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] omalley commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
omalley commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r721569492



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       Actually, especially on S3 it prints a bunch of other data, which is helpful. I haven't looked at DFSInputStream recently, does it just use the Object.toString?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r721847070



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       Could you address @autumnust 's comment?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r725661709



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       https://github.com/apache/orc/blob/93b7aa67830104d6bd7fc55399947ee938549f55/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L99-L104
   Hey guys @omalley @dongjoon-hyun @autumnust 
   It looks better to refer to the `readFileData`  on the exception catch, because here we can get the path variable directly (file and path are both member var). 
   
   update:
   By keeping the length / offset exception thrown, I mean catch IOException again in `readFileData`, so we can print the full message together.
   
   Call chain
   `readFileData`---->`readDiskRanges`----->`readRanges`
   `readFileData`---->`readDiskRanges`----->`zeroCopyReadRanges`
   
   And the parallel `zeroCopyReadRanges` method allows us to catch exceptions and print information about them in the same place.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r725661709



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       https://github.com/apache/orc/blob/93b7aa67830104d6bd7fc55399947ee938549f55/java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java#L99-L104
   Hey guys @omalley @dongjoon-hyun @autumnust 
   It looks better to refer to the `readFileData`  on the exception catch, because here we can get the path variable directly
   
   Call chain
   `readFileData`---->`readDiskRanges`----->`readRanges`
   `readFileData`---->`readDiskRanges`----->`zeroCopyReadRanges`
   
   And the parallel `zeroCopyReadRanges` method allows us to catch exceptions and print information about them in the same place.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r725545400



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       It would be great if you can include the real output example in the PR description.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r725545314



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       It seems that there is no way to print out the file path. Shall we print out `offset` and `buffer.length` only, @omalley ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] autumnust commented on a change in pull request #924: ORC-1014. Add details when we get IOException from the file system

Posted by GitBox <gi...@apache.org>.
autumnust commented on a change in pull request #924:
URL: https://github.com/apache/orc/pull/924#discussion_r719813922



##########
File path: java/core/src/java/org/apache/orc/impl/RecordReaderUtils.java
##########
@@ -414,7 +414,15 @@ static void readRanges(FSDataInputStream file,
     long offset = first.getOffset();
     int readSize = (int) (computeEnd(first, last) - offset);
     byte[] buffer = new byte[readSize];
-    file.readFully(offset, buffer, 0, buffer.length);
+    try {
+      file.readFully(offset, buffer, 0, buffer.length);
+    } catch(IOException e) {
+      throw new IOException(String.format("Failed while reading %s %d:%d",
+                                          file,

Review comment:
       is `FSDataInputStream.toString` only giving some object reference? I am assuming you are trying to print the path instead




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org