You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2020/11/21 14:43:52 UTC

[GitHub] [orc] pgaref opened a new pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

pgaref opened a new pull request #570:
URL: https://github.com/apache/orc/pull/570


   ### What changes were proposed in this pull request?
   This PR aims to add ReaderImpl.extractFileTail back to mitigate backward compatibility (partially as StripeStatistics are removed from ORCTail).
   
   ### Why are the changes needed?
   Apache ORC 1.6.0 ~ 1.6.5 removes extractFileTail  from the ReaderImpl. This PR adds API backward compatibility.
   
   
   ### How was this patch tested?
   TODO
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #570:
URL: https://github.com/apache/orc/pull/570#discussion_r528912903



##########
File path: java/core/src/test/org/apache/orc/impl/TestReaderImpl.java
##########
@@ -368,6 +366,28 @@ public void testOrcTailStripeStats() throws Exception {
           stats.get(0).getColumn(5).getTimestampStatistics();
       assertEquals(-28800000, tsStats.getMinimumUtc());
       assertEquals(-28550000, tsStats.getMaximumUtc());
+
+      // Test Tail and Stats extraction from ByteBuffer
+      ByteBuffer tailBuffer = tail.getSerializedTail();
+      OrcTail extractedTail = ReaderImpl.extractFileTail(tailBuffer);
+
+      assertEquals(tail.getTailBuffer(), extractedTail.getTailBuffer());
+      assertEquals(tail.getTailBuffer().getData(), extractedTail.getTailBuffer().getData());
+      assertEquals(tail.getTailBuffer().getOffset(), extractedTail.getTailBuffer().getOffset());
+      assertEquals(tail.getTailBuffer().getEnd(), extractedTail.getTailBuffer().getEnd());
+
+      assertEquals(tail.getMetadataOffset(), extractedTail.getMetadataOffset());
+      assertEquals(tail.getMetadataSize(), extractedTail.getMetadataSize());
+
+      Reader dummyReader = new ReaderImpl(null,

Review comment:
       For testing, this looks okay for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #570:
URL: https://github.com/apache/orc/pull/570#issuecomment-731600730


   Thank you for making this efforts, @pgaref .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #570:
URL: https://github.com/apache/orc/pull/570#issuecomment-732219355


   Currently using a dummy reader to retrieve Statistics variant from Tail buffer -- we could also have a Utility method to avoid Reader initialisation. Thoughts @omalley @dongjoon-hyun  ? 
   https://github.com/apache/orc/pull/570/files#diff-059408b5b3948fe28049eb419a2516a755be6e98706308370d8c2af81f5b0e15R382


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #570:
URL: https://github.com/apache/orc/pull/570


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #570:
URL: https://github.com/apache/orc/pull/570#issuecomment-732346581


   BTW, @pgaref . Did you run this with old program like Hive/Spark?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #570:
URL: https://github.com/apache/orc/pull/570#issuecomment-733836317


   > BTW, @pgaref . Did you run this with old program like Hive/Spark?
   
   Hey @dongjoon-hyun -- tried Tail extraction from the Hive side and was working fine (planning to check Spark at some point too).
   However, bumping Hive to 1.6 entirely is a much bigger effort as LLAP relies on too many ORC internal APIs (mostly because of their tightly coupled relationship).
   
   I started the effort as part of the Umbrella https://issues.apache.org/jira/browse/HIVE-23553 -- still early in the process though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref edited a comment on pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
pgaref edited a comment on pull request #570:
URL: https://github.com/apache/orc/pull/570#issuecomment-733932346


   > Is there a reason not to add `deprecated` to `ensureOrcFooter`?
   > 
   > * https://github.com/apache/orc/pull/570/files#r528911408
   
   Hey @dongjoon-hyun  -- I skipped it since its not user-facing method but it could be helpful to have indeed.
   Adding it now


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #570:
URL: https://github.com/apache/orc/pull/570#issuecomment-734010786


   Thank you so much, @pgaref !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #570:
URL: https://github.com/apache/orc/pull/570#discussion_r528911408



##########
File path: java/core/src/java/org/apache/orc/impl/ReaderImpl.java
##########
@@ -436,6 +436,31 @@ protected static void ensureOrcFooter(FSDataInputStream in,
     }
   }
 
+  /**
+   * Ensure this is an ORC file to prevent users from trying to read text
+   * files or RC files as ORC files.
+   * @param psLen the postscript length
+   * @param buffer the tail of the file
+   */
+  protected static void ensureOrcFooter(ByteBuffer buffer, int psLen) throws IOException {

Review comment:
       Could you add `@deprecated` to give a proper warning?

##########
File path: java/core/src/java/org/apache/orc/impl/ReaderImpl.java
##########
@@ -643,6 +668,48 @@ private static void read(FSDataInputStream file,
     }
   }
 
+  public static OrcTail extractFileTail(ByteBuffer buffer)

Review comment:
       Could you add `@deprecated` to give a proper warning?

##########
File path: java/core/src/java/org/apache/orc/impl/ReaderImpl.java
##########
@@ -643,6 +668,48 @@ private static void read(FSDataInputStream file,
     }
   }
 
+  public static OrcTail extractFileTail(ByteBuffer buffer)
+      throws IOException {
+    return extractFileTail(buffer, -1);
+  }
+
+  public static OrcTail extractFileTail(ByteBuffer buffer, long modificationTime)

Review comment:
       Could you add `@deprecated` to give a proper warning?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #570:
URL: https://github.com/apache/orc/pull/570#issuecomment-733932346


   > Is there a reason not to add `deprecated` to `ensureOrcFooter`?
   > 
   > * https://github.com/apache/orc/pull/570/files#r528911408
   
   Hey @dongjoon-hyun  -- I skipped it since its not user-facing method but it would be helpful to have the annotation there.
   Adding it now


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #570: ORC-685: Add `ReaderImpl.extractFileTail` back

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #570:
URL: https://github.com/apache/orc/pull/570#discussion_r530610434



##########
File path: java/core/src/java/org/apache/orc/impl/ReaderImpl.java
##########
@@ -436,6 +436,31 @@ protected static void ensureOrcFooter(FSDataInputStream in,
     }
   }
 
+  /**
+   * Ensure this is an ORC file to prevent users from trying to read text
+   * files or RC files as ORC files.
+   * @param psLen the postscript length
+   * @param buffer the tail of the file
+   */
+  protected static void ensureOrcFooter(ByteBuffer buffer, int psLen) throws IOException {

Review comment:
       Gentle ping, @pgaref . This should be marked as `@deprecated`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org