You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2020/09/08 09:22:56 UTC

[GitHub] [orc] wgtmac opened a new pull request #543: ORC-611: [C++] Support timestamp statistics with nanosecond

wgtmac opened a new pull request #543:
URL: https://github.com/apache/orc/pull/543


   ### What changes were proposed in this pull request?
   1. Store nanosecond into TimestampColumnStatistics in C++ writer.
   2. Be aware of nanosecond in the TimestampColumnStatistics.
   3. Adde new functions in TimestampColumnStatistics to get nanosecond while keeping backward compatibility.
   4. Fix PPD to utilize nanosecond or use a default value for timestamp columns.
   
   ### Why are the changes needed?
   To be consistent with the java side.
   
   ### How was this patch tested?
   Several new unit tests have been added to TestPredicateLeaf.cc, TestColumnStatistics.cc, and TestSearchArgument.cc.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r513174385



##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -36,6 +36,33 @@ namespace orc {
    */
   class Literal {
   public:
+    struct Timestamp {
+      Timestamp() = default;
+      Timestamp(const Timestamp&) = default;
+      Timestamp(Timestamp&&) = default;
+      ~Timestamp() = default;
+      Timestamp(int64_t second_, int32_t nano_): second(second_), nano(nano_) {
+        // PASS
+      }
+      Timestamp& operator=(const Timestamp&) = default;
+      Timestamp& operator=(Timestamp&&) = default;
+      bool operator==(const Timestamp& r) const {
+        return second == r.second && nano == r.nano;
+      }
+      bool operator<(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano < r.nano);
+      }
+      bool operator<=(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano <= r.nano);
+      }
+      bool operator!=(const Timestamp& r) const { return !(*this == r); }
+      bool operator>(const Timestamp& r) const { return r < *this; }
+      bool operator>=(const Timestamp& r) const { return r <= *this; }
+      int64_t getMillis() const { return second * 1000 + nano / 1000000; }

Review comment:
       Could you add a test coverage fr `getMillis` function? I cannot find one so far.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-715596374


   Thank you for pinging me, @wgtmac . Sure. I'll review this tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-717538665


   Thank you for update. Sure, I'll review this tonight~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r513174009



##########
File path: c++/src/sargs/PredicateLeaf.cc
##########
@@ -542,11 +543,21 @@ namespace orc {
             colStats.timestampstatistics().has_minimumutc() &&
             colStats.timestampstatistics().has_maximumutc()) {
           const auto& stats = colStats.timestampstatistics();
+          int32_t minNano = stats.has_minimumnanos() ?
+            stats.minimumnanos() - 1 : 0;
+          int32_t maxNano = stats.has_maximumnanos() ?
+            stats.maximumnanos() - 1 : 999999;

Review comment:
       Shall we use `DEFAULT_MAX_NANOS` instead of `999999`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r513173929



##########
File path: c++/src/sargs/PredicateLeaf.cc
##########
@@ -542,11 +543,21 @@ namespace orc {
             colStats.timestampstatistics().has_minimumutc() &&
             colStats.timestampstatistics().has_maximumutc()) {
           const auto& stats = colStats.timestampstatistics();
+          int32_t minNano = stats.has_minimumnanos() ?
+            stats.minimumnanos() - 1 : 0;

Review comment:
       Shall we use `DEFAULT_MIN_NANOS` instead of `0`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540754



##########
File path: c++/src/Statistics.cc
##########
@@ -317,6 +317,8 @@ namespace orc {
       _stats.setMaximum(0);
       _lowerBound = 0;
       _upperBound = 0;
+      _minimumNano = DEFAULT_MIN_NANOS;

Review comment:
       We already use `nanos` in `stats.has_minimumnanos()`, too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540943



##########
File path: c++/src/Statistics.hh
##########
@@ -1298,9 +1356,17 @@ namespace orc {
       if (_stats.hasMinimum()) {
         tsStats->set_minimumutc(_stats.getMinimum());
         tsStats->set_maximumutc(_stats.getMaximum());
+        if (_minimumNano != DEFAULT_MIN_NANOS) {
+          tsStats->set_minimumnanos(_minimumNano + 1);

Review comment:
       Please add a comment about the reason why we use `+1` here, too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
wgtmac commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r514739760



##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -36,6 +36,33 @@ namespace orc {
    */
   class Literal {
   public:
+    struct Timestamp {
+      Timestamp() = default;
+      Timestamp(const Timestamp&) = default;
+      Timestamp(Timestamp&&) = default;
+      ~Timestamp() = default;
+      Timestamp(int64_t second_, int32_t nano_): second(second_), nano(nano_) {
+        // PASS
+      }
+      Timestamp& operator=(const Timestamp&) = default;
+      Timestamp& operator=(Timestamp&&) = default;
+      bool operator==(const Timestamp& r) const {
+        return second == r.second && nano == r.nano;
+      }
+      bool operator<(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano < r.nano);
+      }
+      bool operator<=(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano <= r.nano);
+      }
+      bool operator!=(const Timestamp& r) const { return !(*this == r); }
+      bool operator>(const Timestamp& r) const { return r < *this; }
+      bool operator>=(const Timestamp& r) const { return r <= *this; }
+      int64_t getMillis() const { return second * 1000 + nano / 1000000; }

Review comment:
       Added a test for getMillis. nano / 1000000 is fine because nano only has 9 digits and it will not be zero if it has more than 6 digits.

##########
File path: c++/src/sargs/PredicateLeaf.cc
##########
@@ -510,6 +510,7 @@ namespace orc {
         break;
       }
       case PredicateDataType::STRING: {
+        ///TODO: check lowerBound and upperBound as well

Review comment:
       I will fix it in a separate JIRA and patch. But here I want to leave a mark in case I forget.

##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -36,6 +36,33 @@ namespace orc {
    */
   class Literal {
   public:
+    struct Timestamp {
+      Timestamp() = default;
+      Timestamp(const Timestamp&) = default;
+      Timestamp(Timestamp&&) = default;
+      ~Timestamp() = default;
+      Timestamp(int64_t second_, int32_t nano_): second(second_), nano(nano_) {
+        // PASS
+      }
+      Timestamp& operator=(const Timestamp&) = default;
+      Timestamp& operator=(Timestamp&&) = default;
+      bool operator==(const Timestamp& r) const {
+        return second == r.second && nano == r.nano;
+      }
+      bool operator<(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano < r.nano);
+      }
+      bool operator<=(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano <= r.nano);
+      }
+      bool operator!=(const Timestamp& r) const { return !(*this == r); }
+      bool operator>(const Timestamp& r) const { return r < *this; }
+      bool operator>=(const Timestamp& r) const { return r <= *this; }
+      int64_t getMillis() const { return second * 1000 + nano / 1000000; }
+      int64_t second;
+      int32_t nano;

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r513173658



##########
File path: c++/src/sargs/PredicateLeaf.cc
##########
@@ -510,6 +510,7 @@ namespace orc {
         break;
       }
       case PredicateDataType::STRING: {
+        ///TODO: check lowerBound and upperBound as well

Review comment:
       This looks like irrelevant to `timestamp`. If we need this, can we do it in a separate JIRA, please?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
wgtmac commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r512518148



##########
File path: c++/src/sargs/PredicateLeaf.cc
##########
@@ -510,6 +510,7 @@ namespace orc {
         break;
       }
       case PredicateDataType::STRING: {
+        ///FIXME: check lowerBound and upperBound as well

Review comment:
       yes

##########
File path: c++/test/TestPredicateLeaf.cc
##########
@@ -145,6 +145,22 @@ namespace orc {
     return colStats;
   }
 
+  static proto::ColumnStatistics createTimestampStats(
+                                    int64_t minSecond, int32_t minNano,

Review comment:
       I think we have: https://orc.apache.org/develop/coding/

##########
File path: c++/include/orc/Statistics.hh
##########
@@ -352,7 +352,17 @@ namespace orc {
      */
     virtual int64_t getUpperBound() const = 0;
 
+    /**
+     * Get the last 6 digits of nanosecond of minimum timestamp.
+     * @return last 6 digits of nanosecond of minimum timestamp.
+     */
+    virtual int32_t getMinimumNano() const = 0;
 
+    /**
+     * Get the last 6 digits of nanosecond of maximum timestamp.
+     * @return last 6 digits of nanosecond of maximum timestamp.
+     */
+    virtual int32_t getMaximumNano() const = 0;

Review comment:
       done

##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -63,10 +90,15 @@ namespace orc {
     Literal(bool val);
 
     /**
-     * Create a literal of Timestamp or DATE type

Review comment:
       This is fine. Predicate pushdown in C++ is not complete yet and I have another patch waiting for review.

##########
File path: c++/src/Statistics.hh
##########
@@ -1298,9 +1356,17 @@ namespace orc {
       if (_stats.hasMinimum()) {
         tsStats->set_minimumutc(_stats.getMinimum());
         tsStats->set_maximumutc(_stats.getMaximum());
+        if (_minimumNano != DEFAULT_MIN_NANOS) {
+          tsStats->set_minimumnanos(_minimumNano + 1);

Review comment:
       just to be consistent with java side.

##########
File path: c++/include/orc/Statistics.hh
##########
@@ -352,7 +352,17 @@ namespace orc {
      */
     virtual int64_t getUpperBound() const = 0;
 
+    /**
+     * Get the last 6 digits of nanosecond of minimum timestamp.
+     * @return last 6 digits of nanosecond of minimum timestamp.
+     */
+    virtual int32_t getMinimumNano() const = 0;

Review comment:
       done

##########
File path: c++/src/Statistics.hh
##########
@@ -1214,6 +1214,10 @@ namespace orc {
     bool _hasUpperBound;
     int64_t _lowerBound;
     int64_t _upperBound;
+    int32_t _minimumNano;  // last 6 digits of nanosecond of minimum timestamp

Review comment:
       done

##########
File path: c++/src/Statistics.cc
##########
@@ -327,6 +329,10 @@ namespace orc {
                 (stats.has_maximum() && (statContext.writerTimezone != nullptr)));
       _hasLowerBound = stats.has_minimumutc() || stats.has_minimum();
       _hasUpperBound = stats.has_maximumutc() || stats.has_maximum();
+      _minimumNano = stats.has_minimumnanos() ?
+                     stats.minimumnanos() - 1 : DEFAULT_MIN_NANOS;

Review comment:
       sure

##########
File path: c++/src/Statistics.cc
##########
@@ -317,6 +317,8 @@ namespace orc {
       _stats.setMaximum(0);
       _lowerBound = 0;
       _upperBound = 0;
+      _minimumNano = DEFAULT_MIN_NANOS;

Review comment:
       done

##########
File path: c++/src/Statistics.hh
##########
@@ -1379,6 +1445,22 @@ namespace orc {
         throw ParseError("UpperBound is not defined.");
       }
     }
+
+    int32_t getMinimumNano() const override {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
wgtmac commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-714883468


   @dongjoon-hyun Can you take a look at this? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540726



##########
File path: c++/src/Statistics.cc
##########
@@ -317,6 +317,8 @@ namespace orc {
       _stats.setMaximum(0);
       _lowerBound = 0;
       _upperBound = 0;
+      _minimumNano = DEFAULT_MIN_NANOS;

Review comment:
       This looks inconsistent. We use `NANOS` in `DEFAULT_MIN_NANOS` and `Nano` in `_minimumNano`.
   Maybe, `_minimumNano` -> `_minimumNanos`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540450



##########
File path: c++/include/orc/Statistics.hh
##########
@@ -352,7 +352,17 @@ namespace orc {
      */
     virtual int64_t getUpperBound() const = 0;
 
+    /**
+     * Get the last 6 digits of nanosecond of minimum timestamp.
+     * @return last 6 digits of nanosecond of minimum timestamp.
+     */
+    virtual int32_t getMinimumNano() const = 0;
 
+    /**
+     * Get the last 6 digits of nanosecond of maximum timestamp.
+     * @return last 6 digits of nanosecond of maximum timestamp.
+     */
+    virtual int32_t getMaximumNano() const = 0;

Review comment:
       `getMaximumNano` -> `getMaximumNanos`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] pgaref commented on pull request #543: ORC-611: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
pgaref commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-688900782


   Hey @wgtmac thanks for taking care of this.
   Would it make sense to create a separate child-ticket for the C++ fix? ORC-611 is already merged and could cause confusion


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r513172346



##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -63,10 +90,15 @@ namespace orc {
     Literal(bool val);
 
     /**
-     * Create a literal of Timestamp or DATE type
+     * Create a literal of DATE type
      */
     Literal(PredicateDataType type, int64_t val);
 
+    /**
+     * Create a literal of TIMESTAMP type
+     */
+    Literal(int64_t second, int32_t nano);

Review comment:
       ```c++
   - Literal(int64_t second, int32_t nano);
   + Literal(int64_t second, int32_t nanos);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540987



##########
File path: c++/src/Statistics.hh
##########
@@ -1379,6 +1445,22 @@ namespace orc {
         throw ParseError("UpperBound is not defined.");
       }
     }
+
+    int32_t getMinimumNano() const override {

Review comment:
       `getMinimumNano` -> `getMinimumNanos`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511539389



##########
File path: c++/src/sargs/PredicateLeaf.cc
##########
@@ -510,6 +510,7 @@ namespace orc {
         break;
       }
       case PredicateDataType::STRING: {
+        ///FIXME: check lowerBound and upperBound as well

Review comment:
       Is this `TODO` item?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540445



##########
File path: c++/include/orc/Statistics.hh
##########
@@ -352,7 +352,17 @@ namespace orc {
      */
     virtual int64_t getUpperBound() const = 0;
 
+    /**
+     * Get the last 6 digits of nanosecond of minimum timestamp.
+     * @return last 6 digits of nanosecond of minimum timestamp.
+     */
+    virtual int32_t getMinimumNano() const = 0;

Review comment:
       According to the `orc_proto.proto`, `getMinimumNano` -> `getMinimumNanos`?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac edited a comment on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
wgtmac edited a comment on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-688918948


   > Hey @wgtmac thanks for taking care of this.
   > Would it make sense to create a separate child-ticket for the C++ fix? [ORC-611](https://issues.apache.org/jira/browse/ORC-611) is already merged and could cause confusion
   
   Thanks for the suggestion. @pgaref  Please review this patch if you get the chance. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
wgtmac commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-688918948


   > Hey @wgtmac thanks for taking care of this.
   > Would it make sense to create a separate child-ticket for the C++ fix? [ORC-611](https://issues.apache.org/jira/browse/ORC-611) is already merged and could cause confusion
   
   Thanks for the suggestion. Please review this patch if you get the chance. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-717747511


   BTW, @wgtmac . I fixed the AppVeyor failure in the master branch. Could you rebase this PR to the master branch? I hope to see AppVeyor result because this is C++ PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540611



##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -63,10 +90,15 @@ namespace orc {
     Literal(bool val);
 
     /**
-     * Create a literal of Timestamp or DATE type

Review comment:
       This looks like a breaking change at `Literal` class. Can we create `Timestamp` literal with `Literal(PredicateDataType type, int64_t val)` like before?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540886



##########
File path: c++/src/Statistics.hh
##########
@@ -1214,6 +1214,10 @@ namespace orc {
     bool _hasUpperBound;
     int64_t _lowerBound;
     int64_t _upperBound;
+    int32_t _minimumNano;  // last 6 digits of nanosecond of minimum timestamp

Review comment:
       `_minimumNano` -> `_minimumNanos`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-718371047


   Hi, @wgtmac
   Could you update once more please?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r513172232



##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -36,6 +36,33 @@ namespace orc {
    */
   class Literal {
   public:
+    struct Timestamp {
+      Timestamp() = default;
+      Timestamp(const Timestamp&) = default;
+      Timestamp(Timestamp&&) = default;
+      ~Timestamp() = default;
+      Timestamp(int64_t second_, int32_t nano_): second(second_), nano(nano_) {
+        // PASS
+      }
+      Timestamp& operator=(const Timestamp&) = default;
+      Timestamp& operator=(Timestamp&&) = default;
+      bool operator==(const Timestamp& r) const {
+        return second == r.second && nano == r.nano;
+      }
+      bool operator<(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano < r.nano);
+      }
+      bool operator<=(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano <= r.nano);
+      }
+      bool operator!=(const Timestamp& r) const { return !(*this == r); }
+      bool operator>(const Timestamp& r) const { return r < *this; }
+      bool operator>=(const Timestamp& r) const { return r <= *this; }
+      int64_t getMillis() const { return second * 1000 + nano / 1000000; }
+      int64_t second;
+      int32_t nano;

Review comment:
       Can we use `nanos` which denotes `nano seconds`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511539665



##########
File path: c++/test/TestPredicateLeaf.cc
##########
@@ -145,6 +145,22 @@ namespace orc {
     return colStats;
   }
 
+  static proto::ColumnStatistics createTimestampStats(
+                                    int64_t minSecond, int32_t minNano,

Review comment:
       Just a question. Do we have an indentation rule in Apache ORC C++ module?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540943



##########
File path: c++/src/Statistics.hh
##########
@@ -1298,9 +1356,17 @@ namespace orc {
       if (_stats.hasMinimum()) {
         tsStats->set_minimumutc(_stats.getMinimum());
         tsStats->set_maximumutc(_stats.getMaximum());
+        if (_minimumNano != DEFAULT_MIN_NANOS) {
+          tsStats->set_minimumnanos(_minimumNano + 1);

Review comment:
       Please add a comment about the reason why we use `+1` here, too. Otherwise, it looks like a magic number.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
wgtmac commented on pull request #543:
URL: https://github.com/apache/orc/pull/543#issuecomment-719141900


   > Hi, @wgtmac
   > Could you update once more please?
   
   Done. Please check it again. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r511540833



##########
File path: c++/src/Statistics.cc
##########
@@ -327,6 +329,10 @@ namespace orc {
                 (stats.has_maximum() && (statContext.writerTimezone != nullptr)));
       _hasLowerBound = stats.has_minimumutc() || stats.has_minimum();
       _hasUpperBound = stats.has_maximumutc() || stats.has_maximum();
+      _minimumNano = stats.has_minimumnanos() ?
+                     stats.minimumnanos() - 1 : DEFAULT_MIN_NANOS;

Review comment:
       Could you add some comment about why why we use `-1` here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on a change in pull request #543: ORC-663: [C++] Support timestamp statistics with nanosecond

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #543:
URL: https://github.com/apache/orc/pull/543#discussion_r513172034



##########
File path: c++/include/orc/sargs/Literal.hh
##########
@@ -36,6 +36,33 @@ namespace orc {
    */
   class Literal {
   public:
+    struct Timestamp {
+      Timestamp() = default;
+      Timestamp(const Timestamp&) = default;
+      Timestamp(Timestamp&&) = default;
+      ~Timestamp() = default;
+      Timestamp(int64_t second_, int32_t nano_): second(second_), nano(nano_) {
+        // PASS
+      }
+      Timestamp& operator=(const Timestamp&) = default;
+      Timestamp& operator=(Timestamp&&) = default;
+      bool operator==(const Timestamp& r) const {
+        return second == r.second && nano == r.nano;
+      }
+      bool operator<(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano < r.nano);
+      }
+      bool operator<=(const Timestamp& r) const {
+        return second < r.second || (second == r.second && nano <= r.nano);
+      }
+      bool operator!=(const Timestamp& r) const { return !(*this == r); }
+      bool operator>(const Timestamp& r) const { return r < *this; }
+      bool operator>=(const Timestamp& r) const { return r <= *this; }
+      int64_t getMillis() const { return second * 1000 + nano / 1000000; }

Review comment:
       Is `nano / 1000000` safe? It looks like `0` always due to the integer division.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org