You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2020/12/27 20:54:19 UTC

[GitHub] [orc] pgaref opened a new pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

pgaref opened a new pull request #588:
URL: https://github.com/apache/orc/pull/588


   ### What changes were proposed in this pull request?
   RecordReaderImp should pass down the writer calendar info (writerUsedProlepticGregorian) when evaluating predicates to make sure column stats are properly deserialized (affects TimestampStatistics).
   
   
   ### Why are the changes needed?
   Correct evaluation of predicates with Timestamps.
   
   
   ### How was this patch tested?
   TestRecordReaderImpl.testPredEvalTimestampStatsDiffWriter
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #588:
URL: https://github.com/apache/orc/pull/588#discussion_r549571330



##########
File path: java/core/src/test/org/apache/orc/impl/TestRecordReaderImpl.java
##########
@@ -593,7 +593,22 @@ static TruthValue evaluateTimestamp(OrcProto.ColumnStatistics stats,
     return RecordReaderImpl.evaluatePredicateProto(stats, predicate, null,
         encoding, null,
         include135 ? OrcFile.WriterVersion.ORC_135: OrcFile.WriterVersion.ORC_101,
-        TypeDescription.createTimestamp(), useUTCTimestamp);
+        TypeDescription.createTimestamp(), true, useUTCTimestamp);
+  }
+
+  static TruthValue evaluateTimestampWithWriterCalendar(OrcProto.ColumnStatistics stats,
+                                                        PredicateLeaf predicate,
+                                                        boolean include135,
+                                                        boolean writerUsedProlepticGregorian,
+                                                        boolean useUTCTimestamp) {
+    OrcProto.ColumnEncoding encoding =
+            OrcProto.ColumnEncoding.newBuilder()

Review comment:
       Could you follow the indentation style around here like line 577 and line 590?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #588:
URL: https://github.com/apache/orc/pull/588#discussion_r549571412



##########
File path: java/core/src/test/org/apache/orc/impl/TestRecordReaderImpl.java
##########
@@ -593,7 +593,22 @@ static TruthValue evaluateTimestamp(OrcProto.ColumnStatistics stats,
     return RecordReaderImpl.evaluatePredicateProto(stats, predicate, null,
         encoding, null,
         include135 ? OrcFile.WriterVersion.ORC_135: OrcFile.WriterVersion.ORC_101,
-        TypeDescription.createTimestamp(), useUTCTimestamp);
+        TypeDescription.createTimestamp(), true, useUTCTimestamp);
+  }
+
+  static TruthValue evaluateTimestampWithWriterCalendar(OrcProto.ColumnStatistics stats,
+                                                        PredicateLeaf predicate,
+                                                        boolean include135,
+                                                        boolean writerUsedProlepticGregorian,
+                                                        boolean useUTCTimestamp) {
+    OrcProto.ColumnEncoding encoding =
+            OrcProto.ColumnEncoding.newBuilder()
+                    .setKind(OrcProto.ColumnEncoding.Kind.DIRECT)
+                    .build();
+    return RecordReaderImpl.evaluatePredicateProto(stats, predicate, null,
+            encoding, null,

Review comment:
       ditto.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #588:
URL: https://github.com/apache/orc/pull/588#discussion_r549244315



##########
File path: java/core/src/java/org/apache/orc/impl/ColumnStatisticsImpl.java
##########
@@ -2100,6 +2100,12 @@ public static ColumnStatisticsImpl deserialize(TypeDescription schema,
     return deserialize(schema, stats, true, true);
   }
 
+  public static ColumnStatisticsImpl deserialize(TypeDescription schema,
+                                                 OrcProto.ColumnStatistics stats,
+                                                 boolean writerUsedProlepticGregorian) {
+    return deserialize(schema, stats, writerUsedProlepticGregorian, true);
+  }
+

Review comment:
       If you don't mind, shall we remove this addition in order to keep the simplicity?
   
   This new `static` function is used once and we can remove this new API like the following.
   ```java
   - ColumnStatistics cs = ColumnStatisticsImpl.deserialize(null, statsProto, writerUsedProlepticGregorian);
   + ColumnStatistics cs = ColumnStatisticsImpl.deserialize(null, statsProto, writerUsedProlepticGregorian, true);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #588:
URL: https://github.com/apache/orc/pull/588


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on a change in pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

Posted by GitBox <gi...@apache.org>.
pgaref commented on a change in pull request #588:
URL: https://github.com/apache/orc/pull/588#discussion_r549257563



##########
File path: java/core/src/java/org/apache/orc/impl/ColumnStatisticsImpl.java
##########
@@ -2100,6 +2100,12 @@ public static ColumnStatisticsImpl deserialize(TypeDescription schema,
     return deserialize(schema, stats, true, true);
   }
 
+  public static ColumnStatisticsImpl deserialize(TypeDescription schema,
+                                                 OrcProto.ColumnStatistics stats,
+                                                 boolean writerUsedProlepticGregorian) {
+    return deserialize(schema, stats, writerUsedProlepticGregorian, true);
+  }
+

Review comment:
       Sure, makes sense. Just updated the PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #588:
URL: https://github.com/apache/orc/pull/588#issuecomment-751982911


   Thanks again for the review @dongjoon-hyun  !  Latest OS upgrade wiped the IDE style settings for some reason which is quite frustrating -- should be back to normal now :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #588: ORC-705: Predicate evaluation should take into account writer calendar

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #588:
URL: https://github.com/apache/orc/pull/588#discussion_r549571575



##########
File path: java/core/src/test/org/apache/orc/impl/TestRecordReaderImpl.java
##########
@@ -895,6 +910,40 @@ public void testPredEvalWithDecimalStats() throws Exception {
         evaluateInteger(createDecimalStats("10.0", "100.0"), pred));
   }
 
+  @Test
+  public void testPredEvalTimestampStatsDiffWriter() {
+    // Proleptic - NoUTC
+    PredicateLeaf pred = createPredicateLeaf(PredicateLeaf.Operator.NULL_SAFE_EQUALS,
+            PredicateLeaf.Type.TIMESTAMP, "x", Timestamp.valueOf("1017-01-01 00:00:00"), null);

Review comment:
       indentation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org