You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/09/09 00:25:00 UTC

[GitHub] [orc] omalley opened a new pull request #904: ORC-984: Save the software version that wrote each ORC file.

omalley opened a new pull request #904:
URL: https://github.com/apache/orc/pull/904


   
   ### What changes were proposed in this pull request?
   
   I add a string to the file footer that records the version that wrote the file. We already had recorded the implementation that wrote the file as Footer.writer. I added a method that combines these two fields in the reader to produce a user facing string that describes the software version.
   
   I also add a field to the meta data tool to show the version that wrote the file.
   
   Because of that change and the fact that the files change size based on whether the ORC version is a snapshot or not, I had to extend the tests for TestFileDump to allow some slop for the size and to ignore the file version.
   
   ### How was this patch tested?
   
   It passes the unit tests after I updated the tools tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] omalley commented on a change in pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
omalley commented on a change in pull request #904:
URL: https://github.com/apache/orc/pull/904#discussion_r706511046



##########
File path: java/core/src/java/org/apache/orc/Reader.java
##########
@@ -158,6 +158,16 @@
    */
   OrcFile.WriterVersion getWriterVersion();
 
+  /**
+   * Get the implementation and version of the software that wrote the file.
+   * It defaults to "ORC Java" for old files. For current files, we include the
+   * version also.
+   * @since 1.5.13

Review comment:
       That was my intention.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #904:
URL: https://github.com/apache/orc/pull/904#discussion_r705136996



##########
File path: java/core/src/java/org/apache/orc/Reader.java
##########
@@ -158,6 +158,16 @@
    */
   OrcFile.WriterVersion getWriterVersion();
 
+  /**
+   * Get the implementation and version of the software that wrote the file.
+   * It defaults to "ORC Java" for old files. For current files, we include the
+   * version also.
+   * @since 1.5.13

Review comment:
       Thank you for adding `since` info, @omalley .
   BTW, do we need to backport this new feature to `branch-1.5`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] omalley commented on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
omalley commented on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-915662913


   The software version in the API and tool shows up as "ORC Java 1.8.0-SNAPSHOT". When run from IntelliJ the unit tests can't find the software version, so it comes in "ORC Java unknown".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] omalley commented on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
omalley commented on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-917270622


   BTW, to generate the new c++ ORC file, I wrote a little utility that takes an ORC file and rewrites it using c++. It is in https://github.com/omalley/orc/tree/cpp-convert .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] omalley commented on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
omalley commented on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-915663381


   I haven't updated the C++ reader, writer, and tool yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-916067095


   > BTW, according to GitHub Action jobs, all Java11+ seems to complain for some reasons.
   
   This should be a bug in findbugs. Java 11 compiles try-with-resources to generate bytecode differently than java8. Findbugs is considered RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE from bytecode analysis. 
   https://github.com/spotbugs/spotbugs/issues/259
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #904:
URL: https://github.com/apache/orc/pull/904#discussion_r706865508



##########
File path: c++/src/Common.cc
##########
@@ -58,12 +58,38 @@ namespace orc {
         return "ORC-101";
       case WriterVersion_ORC_135:
         return "ORC-135";
+      case WriterVersion_ORC_517:
+        return "ORC-517";
+      case WriterVersion_ORC_203:
+        return "ORC-203";
+      case WriterVersion_ORC_14:
+        return "ORC-14";
     }
     std::stringstream buffer;
     buffer << "future - " << version;
     return buffer.str();
   }
 
+  std::string writerIdToString(uint32_t id) {
+    switch (id) {
+    case ORC_JAVA_WRITER:

Review comment:
       This indentation seems to be different from the other `switch` statements in this file




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] omalley commented on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
omalley commented on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-917261573


   Ok, the changes for C++ were relatively straightforward. I did update the C++ with the new writer versions, such as the important ORC-14.
   
   I updated the orc_split_elim_new.orc so that it was written by the 1.8.0-SNAPSHOT. I also added orc_split_elim_cpp.orc that is the same file written by c++. The Java and C++ tools show the files as having being written by the corresponding versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-917482663


   So, everything is ready, right, @omalley ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #904:
URL: https://github.com/apache/orc/pull/904


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang edited a comment on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
guiyanakuang edited a comment on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-916067095


   > BTW, according to GitHub Action jobs, all Java11+ seems to complain for some reasons.
   
   This should be a bug in findbugs. Java 11 compiles try-with-resources to generate bytecode differently than java8. Findbugs analyze bytecode matche RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE.
   https://github.com/spotbugs/spotbugs/issues/259
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #904:
URL: https://github.com/apache/orc/pull/904#discussion_r706674348



##########
File path: tools/test/TestMatch.cc
##########
@@ -448,17 +472,34 @@ namespace orc {
 					"decimal1:decimal(16,6),"
                                         "ts:timestamp>"),
                                        "0.12",
+				       "ORC Java 1.8.0-SNAPSHOT",
                                        25000,
-                                       1981,
+                                       1980,
                                        1,
                                        CompressionKind_ZLIB,
                                        262144,
                                        10000,
                                        std::map<std::string, std::string>()),
+                    OrcFileDescription("orc_split_elim_cpp.orc",
+                                       "orc_split_elim_cpp.jsn.gz",
+                                       ("struct<userid:bigint,string1:string,"
+                                        "subtype:double,"
+					"decimal1:decimal(16,6),"

Review comment:
       ditto.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #904:
URL: https://github.com/apache/orc/pull/904#discussion_r706674156



##########
File path: tools/test/TestMatch.cc
##########
@@ -163,6 +167,7 @@ namespace orc {
                                        "TestOrcFile.columnProjection.jsn.gz",
                                        "struct<int1:int,string1:string>",
                                        "0.12",
+				       "ORC Java",

Review comment:
       Indentation seems to be broken due to `tab` character.

##########
File path: tools/test/TestMatch.cc
##########
@@ -182,6 +187,7 @@ namespace orc {
                                        "string>>,map:map<string,struct<int1:"
                                        "int,string1:string>>>",
                                        "0.12",
+				       "ORC Java",

Review comment:
       ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #904: ORC-984: Save the software version that wrote each ORC file.

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #904:
URL: https://github.com/apache/orc/pull/904#issuecomment-915906235


   BTW, according to GitHub Action jobs, all Java11+ seems to complain for some reasons.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org