You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/27 03:28:41 UTC

[GitHub] [hive] jcamachor commented on a change in pull request #1823: HIVE-23553: Upgrade ORC version to 1.6.7

jcamachor commented on a change in pull request #1823:
URL: https://github.com/apache/hive/pull/1823#discussion_r564992785



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileFormatProxy.java
##########
@@ -47,12 +47,14 @@ public SplitInfos applySargToMetadata(
     OrcTail orcTail = ReaderImpl.extractFileTail(fileMetadata);
     OrcProto.Footer footer = orcTail.getFooter();
     int stripeCount = footer.getStripesCount();
-    boolean writerUsedProlepticGregorian = footer.hasCalendar()
-        ? footer.getCalendar() == OrcProto.CalendarKind.PROLEPTIC_GREGORIAN
-        : OrcConf.PROLEPTIC_GREGORIAN_DEFAULT.getBoolean(conf);
+    // Always convert To PROLEPTIC_GREGORIAN

Review comment:
       Why is it OK to use proleptic calendar always here? Could we leave short explanation in the comment for when we need to revisit this code?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java
##########
@@ -282,6 +280,56 @@ public String toString() {
     }
   }
 
+  public static boolean[] findPresentStreamsByColumn(

Review comment:
       Can we add javadoc for these public static utility methods? If they are used only in this class, should we change their visibility?

##########
File path: ql/src/test/results/clientpositive/llap/schema_evol_orc_nonvec_part_all_primitive.q.out
##########
@@ -687,11 +687,11 @@ POSTHOOK: Input: default@part_change_various_various_timestamp_n6
 POSTHOOK: Input: default@part_change_various_various_timestamp_n6@part=1
 #### A masked pattern was here ####
 insert_num	part	c1	c2	c3	c4	c5	c6	c7	c8	c9	c10	c11	c12	b
-101	1	1970-01-01 00:00:00.001	1969-12-31 23:59:59.872	NULL	1969-12-07 03:28:36.352	NULL	NULL	NULL	NULL	6229-06-28 02:54:28.970117179	6229-06-28 02:54:28.97011	6229-06-28 02:54:28.97011	1950-12-18 00:00:00	original
-102	1	1970-01-01 00:00:00	1970-01-01 00:00:00.127	1970-01-01 00:00:32.767	1970-01-25 20:31:23.647	NULL	NULL	NULL	NULL	5966-07-09 03:30:50.597	5966-07-09 03:30:50.597	5966-07-09 03:30:50.597	2049-12-18 00:00:00	original
+101	1	1970-01-01 00:00:01	1969-12-31 23:57:52	NULL	1901-12-13 20:45:52	NULL	NULL	NULL	NULL	6229-06-28 02:54:28.970117179	6229-06-28 02:54:28.97011	6229-06-28 02:54:28.97011	1950-12-18 00:00:00	original

Review comment:
       This shifting for timestamp values does not seem right (or at least I cannot make sense of it). Could you explain what is going on here? Some of the shifting is significant: For those, I remember there were some backwards incompatible changes in schema evolution in 1.6.x, it may be related to that? However, other shifting seems a bit more suspicious, e.g., 1 second, ~2 minutes?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
##########
@@ -2585,6 +2590,7 @@ private static TreeReader getPrimitiveTreeReader(final int columnIndex,
             .setColumnEncoding(columnEncoding)
             .setVectors(vectors)
             .setContext(context)
+            .setIsInstant(columnType.getCategory()  == TypeDescription.Category.TIMESTAMP_INSTANT)

Review comment:
       As @mustafaiman  mentioned, I think this should be always false indeed: TIMESTAMP_INSTANT is equivalent to TIMESTAMP_WITH_LOCAL_TIME_ZONE type in Hive. AFAIK support to read/write timestamp with local time zone in ORC is not implemented yet.

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
##########
@@ -325,7 +326,7 @@ public void testReadFormat_0_11() throws Exception {
         + "binary,string1:string,middle:struct<list:array<struct<int1:int,"
         + "string1:string>>>,list:array<struct<int1:int,string1:string>>,"
         + "map:map<string,struct<int1:int,string1:string>>,ts:timestamp,"
-        + "decimal1:decimal(38,18)>", readerInspector.getTypeName());
+        + "decimal1:decimal(38,10)>", readerInspector.getTypeName());

Review comment:
       Change in decimal scale. Expected?

##########
File path: ql/src/test/results/clientpositive/llap/orc_file_dump.q.out
##########
@@ -249,15 +249,15 @@ Stripes:
       Entry 1: numHashFunctions: 4 bitCount: 6272 popCount: 182 loadFactor: 0.029 expectedFpp: 7.090246E-7
       Stripe level merge: numHashFunctions: 4 bitCount: 6272 popCount: 1772 loadFactor: 0.2825 expectedFpp: 0.0063713384
     Row group indices for column 9:
-      Entry 0: count: 1000 hasNull: false min: 2013-03-01 09:11:58.703 max: 2013-03-01 09:11:58.703 positions: 0,0,0,0,0,0
-      Entry 1: count: 49 hasNull: false min: 2013-03-01 09:11:58.703 max: 2013-03-01 09:11:58.703 positions: 0,7,488,0,1538,488
+      Entry 0: count: 1000 hasNull: false min: 2013-03-01 09:11:58.70307 max: 2013-03-01 09:11:58.703325 positions: 0,0,0,0,0,0
+      Entry 1: count: 49 hasNull: false min: 2013-03-01 09:11:58.703076 max: 2013-03-01 09:11:58.703325 positions: 0,7,488,0,1538,488
     Bloom filters for column 9:
       Entry 0: numHashFunctions: 4 bitCount: 6272 popCount: 4 loadFactor: 0.0006 expectedFpp: 1.6543056E-13
       Entry 1: numHashFunctions: 4 bitCount: 6272 popCount: 4 loadFactor: 0.0006 expectedFpp: 1.6543056E-13
       Stripe level merge: numHashFunctions: 4 bitCount: 6272 popCount: 4 loadFactor: 0.0006 expectedFpp: 1.6543056E-13
     Row group indices for column 10:
-      Entry 0: count: 1000 hasNull: false min: 8 max: 9994 sum: 5118211 positions: 0,0,0,0,0
-      Entry 1: count: 49 hasNull: false min: 248 max: 9490 sum: 246405 positions: 0,2194,0,4,488
+      Entry 0: count: 1000 hasNull: false min: 0.08 max: 99.94 sum: 51182.11 positions: 0,0,0,0,0

Review comment:
       min/max changed. Expected?

##########
File path: ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_multicol.q.out
##########
@@ -355,7 +355,7 @@ Stage-1 HIVE COUNTERS:
    RECORDS_OUT_OPERATOR_TS_3: 800
    TOTAL_TABLE_ROWS_WRITTEN: 0
 Stage-1 LLAP IO COUNTERS:
-   CACHE_HIT_BYTES: 138344
+   CACHE_MISS_BYTES: 138342

Review comment:
       Expected? Was it a typo?

##########
File path: ql/src/test/results/clientpositive/llap/orc_file_dump.q.out
##########
@@ -111,7 +111,7 @@ Stripe Statistics:
     Column 6: count: 1049 hasNull: false bytesOnDisk: 3323 min: 0.02 max: 49.85 sum: 26286.349999999977
     Column 7: count: 1049 hasNull: false bytesOnDisk: 137 true: 526
     Column 8: count: 1049 hasNull: false bytesOnDisk: 3430 min:  max: zach zipper sum: 13443
-    Column 9: count: 1049 hasNull: false bytesOnDisk: 1802 min: 2013-03-01 09:11:58.703 max: 2013-03-01 09:11:58.703
+    Column 9: count: 1049 hasNull: false bytesOnDisk: 1802 min: 2013-03-01 09:11:58.70307 max: 2013-03-01 09:11:58.703325

Review comment:
       Timestamp precision change. Expected?

##########
File path: ql/src/test/results/clientpositive/llap/acid_bloom_filter_orc_file_dump.q.out
##########
@@ -76,7 +76,7 @@ PREHOOK: Input: default@bloomtest
 #### A masked pattern was here ####
 -- BEGIN ORC FILE DUMP --
 #### A masked pattern was here ####
-File Version: 0.12 with ORC_517
+File Version: 0.12 with ORC_14

Review comment:
       Is this change expected?

##########
File path: ql/src/test/results/clientpositive/tez/orc_merge12.q.out
##########
@@ -162,8 +162,8 @@ Stripe Statistics:
     Column 6: count: 9174 hasNull: true min: -16379.0 max: 9763215.5639 sum: 5.62236530305E7
     Column 7: count: 12288 hasNull: false min: 00020767-dd8f-4f4d-bd68-4b7be64b8e44 max: fffa3516-e219-4027-b0d3-72bb2e676c52 sum: 442368
     Column 8: count: 12288 hasNull: false min: 000976f7-7075-4f3f-a564-5a375fafcc101416a2b7-7f64-41b7-851f-97d15405037e max: fffd0642-5f01-48cd-8d97-3428faee49e9b39f2b4c-efdc-4e5f-9ab5-4aa5394cb156 sum: 884736
-    Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 1969-12-31 16:00:30.808
-    Column 10: count: 9174 hasNull: true min: 1969-12-31 15:59:30.929 max: 1969-12-31 16:00:30.808
+    Column 9: count: 9173 hasNull: true min: 1969-12-31 15:59:30.929 max: 1969-12-31 16:00:30.808999999

Review comment:
       Change in timestamp precision in stats. Expected?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org