You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by hakanmemisoglu <gi...@git.apache.org> on 2016/09/02 20:08:17 UTC

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

GitHub user hakanmemisoglu opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/98

    QUICKSTEP-53 DateLit improvements

    This PR changes DateLit represantation:
    
    - Encode year, month and day information into an u32 field instead of keeping them separate in three different fields. The change reduces the size of DateLit from 48 bits to 32 bits.
    - With the internal representation change, we changed the comparison operators' implementation. They now use one unsigned integer comparison for each method.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep date-representation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/98.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #98
    
----
commit 3c8708dfd0987b3491bcabb94e77dd95ea0aea10
Author: Harshad Deshmukh <hb...@apache.org>
Date:   2016-08-29T19:03:52Z

    Separate Date type from Datetime type.
    
    - Beginning from the parser to the operator execution level, treat Date type
      sepearately than the Datetime type.
    - Provide support for the extract function when applied on the Date
      type, implemented in the DateExtractOperation.
    - Modified the tests.

commit 998994eee67774d24b4a31a8aea3b1924b0e04ea
Author: Hakan Memisoglu <ha...@gmail.com>
Date:   2016-09-02T19:53:47Z

    New representation and comparison operators for DateLit.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/98#discussion_r77752076
  
    --- Diff: types/DatetimeLit.hpp ---
    @@ -51,53 +54,70 @@ struct DateLit {
             + 1   // -
             + 2;  // Day
     
    +  // Years should be between [-kMaxYear, +kMaxYear] inclusive both end.
    +  static constexpr std::int32_t kMaxYear = 99999;
    +  static constexpr std::uint8_t kBitsNeededForDay = 5u;
    +  static constexpr std::uint8_t kBitsNeededForMonth = 4u;
    +
       static DateLit Create(const std::int32_t _year,
                             const std::uint8_t _month,
                             const std::uint8_t _day) {
         DateLit date;
    -    date.year = _year;
    -    date.month = _month;
    -    date.day = _day;
    +    // Normalize year by adding kMaxYear value, because we try to
    +    // encode signed year value into an unsigned integer.
    --- End diff --
    
    Yes, it does. This means that the leap year calculations are indeed based on the original year and not the normalized one, which is correct behavior. Thanks for the clarification! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

Posted by hakanmemisoglu <gi...@git.apache.org>.
Github user hakanmemisoglu closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/98


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

Posted by hakanmemisoglu <gi...@git.apache.org>.
Github user hakanmemisoglu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/98#discussion_r77749586
  
    --- Diff: types/DatetimeLit.hpp ---
    @@ -51,53 +54,70 @@ struct DateLit {
             + 1   // -
             + 2;  // Day
     
    +  // Years should be between [-kMaxYear, +kMaxYear] inclusive both end.
    +  static constexpr std::int32_t kMaxYear = 99999;
    +  static constexpr std::uint8_t kBitsNeededForDay = 5u;
    +  static constexpr std::uint8_t kBitsNeededForMonth = 4u;
    +
       static DateLit Create(const std::int32_t _year,
                             const std::uint8_t _month,
                             const std::uint8_t _day) {
         DateLit date;
    -    date.year = _year;
    -    date.month = _month;
    -    date.day = _day;
    +    // Normalize year by adding kMaxYear value, because we try to
    +    // encode signed year value into an unsigned integer.
    --- End diff --
    
    Hi @hbdeshmukh 
    In the code, the leap year check and other checks such whether the year is valid or not, are done within `printValueToString() `and `parseValueFromString() `. These methods use `yearField()`, `monthField()`, `dayField()` to decode the real number from the representation. I think it should be OK. 
    I did not understand actually why it should provide:
    > if _year is a leap year, is there a guarantee that _year + kMaxYear is also a leap year? 
    
    Or do I miss some other leap year checks done in the code?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by hakanmemisoglu <gi...@git.apache.org>.
Github user hakanmemisoglu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    @zuyu I don't have the number. I'll post it when I have.
    
    > I don't think so. The existing represent of one uint32_t and two uint8_ts would take 64 bits in total due to memory alignment requirements, instead of 48 bits as what you thought. 
    
    Yes, I checked it, you are right. I did not know structs as themselves have a natural alignment. It will  be padded with 2 byte because first member.
    
    However, this does not make my argument false, even supports it. As far as I know we store columns in natural format, then when we process them, we convert them into series of TypedValues. So keeping the sizes of structs small will help when we keep DateLit in storage memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

Posted by hakanmemisoglu <gi...@git.apache.org>.
Github user hakanmemisoglu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/98#discussion_r77751465
  
    --- Diff: types/DatetimeLit.hpp ---
    @@ -51,53 +54,70 @@ struct DateLit {
             + 1   // -
             + 2;  // Day
     
    +  // Years should be between [-kMaxYear, +kMaxYear] inclusive both end.
    +  static constexpr std::int32_t kMaxYear = 99999;
    +  static constexpr std::uint8_t kBitsNeededForDay = 5u;
    +  static constexpr std::uint8_t kBitsNeededForMonth = 4u;
    +
       static DateLit Create(const std::int32_t _year,
                             const std::uint8_t _month,
                             const std::uint8_t _day) {
         DateLit date;
    -    date.year = _year;
    -    date.month = _month;
    -    date.day = _day;
    +    // Normalize year by adding kMaxYear value, because we try to
    +    // encode signed year value into an unsigned integer.
    --- End diff --
    
    **L125** `std::int32_t result_year = lhs.yearField() + (rhs.months / 12);`
    **L126** `std::uint8_t result_month = static_cast<std::uint8_t>(lhs.monthField()) + (rhs.months % 12);`
    **L134** `ClampDayOfMonth(result_year, result_month, lhs.dayField()));`
    
    - Access to the struct happens via these methods (yearField(), monthField() etc...) that decodes from representation (normalized and shifted) to the year value (which is -2000 in your example). 
    - After the operation + is done, and the check is applied. 
    - The result year, month, day is calculated and given to `DateLit::Create()`
    -  `DateLit::Create()`encodes the real values to the unified representation.
    
    I might be missing something, but does the execution above make sense?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/98#discussion_r77748322
  
    --- Diff: types/DatetimeLit.hpp ---
    @@ -51,53 +54,70 @@ struct DateLit {
             + 1   // -
             + 2;  // Day
     
    +  // Years should be between [-kMaxYear, +kMaxYear] inclusive both end.
    +  static constexpr std::int32_t kMaxYear = 99999;
    +  static constexpr std::uint8_t kBitsNeededForDay = 5u;
    +  static constexpr std::uint8_t kBitsNeededForMonth = 4u;
    +
       static DateLit Create(const std::int32_t _year,
                             const std::uint8_t _month,
                             const std::uint8_t _day) {
         DateLit date;
    -    date.year = _year;
    -    date.month = _month;
    -    date.day = _day;
    +    // Normalize year by adding kMaxYear value, because we try to
    +    // encode signed year value into an unsigned integer.
    --- End diff --
    
    Hi @hakanmemisoglu 
    
    What's the impact on the leap year calculation because of this normalization (i.e. adding kMaxYear value to the negative value of year). More specifically, if ``_year`` is a leap year, is there a guarantee that ``_year + kMaxYear`` is also a leap year?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/98#discussion_r77553615
  
    --- Diff: types/operations/unary_operations/CMakeLists.txt ---
    @@ -1,4 +1,4 @@
    -# Licensed to the Apache Software Foundation (ASF) under one
    +# Licensed to the Agache Software Foundation (ASF) under one
    --- End diff --
    
    ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    @hakanmemisoglu Again, we rarely use `DateLit` directly. On the other hand, when using `TypedValue`, any changes to `DateLit` does not save memory space due to the union alignment inside `TypedValue`. In our case, it would always be 64 bits.
    
    So I suggest we have the numbers before merging this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by hakanmemisoglu <gi...@git.apache.org>.
Github user hakanmemisoglu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    @zuyu  This will be merged after @hbdeshmukh's changes. Since we are use rebasing, I don't think it will be a problem. 
    For the efficiency part:
    1-) Using new comparison operators obviously are faster than old counterparts using if else statements. (Of course the overall effect might not be fast as we expected in the TPCH queries.)  Thank you.
    2-) No matter why TypedValue is used in query processing , the cutting the size of struct representation by 50% improves the memory usage:
     a-)  A page can contain 50% more values.
     b-) Recent x86 architectures favor using the values 32 bits or 64 bits, not 48 bits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by pateljm <gi...@git.apache.org>.
Github user pateljm commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    Nice observation and enhancement @hakanmemisoglu!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    > For the efficiency part:
    1-) Using new comparison operators obviously are faster than old counterparts using if else statements. (Of course the overall effect might not be fast as we expected in the TPCH queries.) Thank you.
    
    I agree there would be some improvements in the theory, but do you have the number?
    
    > 2-) No matter why TypedValue is used in query processing , the cutting the size of struct representation by 50% improves the memory usage:
    a-) A page can contain 50% more values.
    b-) Recent x86 architectures favor using the values 32 bits or 64 bits, not 48 bits.
    
    I don't think so. The existing represent of one `uint32_t` and two `uint8_t`s would take 64 bits in total due to memory alignment requirements, instead of 48 bits as what you thought. Furthermore, as we use `TypeValue` instead of `DateLit` directly, I don't see much space improvements after refactoring using one single `uint32_t`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    Hi @zuyu 
    
    >IMHO, such DateLit encode is not necessary, as inside TypedValue it stores in a 64-byte union space.
    For the base table storage, your observation is correct. However, for evaluation of the EXTRACT and comparisons on the date, the compact representation should certainly be helpful. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    @hbdeshmukh I guess the possible performance improvements come from a) using `DateLit` instead of `DatetimeLit`, b) a better implementation of `Extract` function, but not the internal representation of `DateLit`.
    
    @hakanmemisoglu Could you please split the changes in this PR into multiple new PRs, and then compare the performance only between two different representation of `DateLit`? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by hakanmemisoglu <gi...@git.apache.org>.
Github user hakanmemisoglu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    I will close this PR since an issue in the compression code should be handled first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #98: QUICKSTEP-53 DateLit improvements

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/98
  
    @hakanmemisoglu My 2 cents:
    
     1. IMHO, such `DateLit` encode is not necessary, as inside `TypedValue` it stores in a 64-byte `union` space.
     2. This PR shares many changes with #97. Thus, you may want to work w/ @hbdeshmukh to reduce the common changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #98: QUICKSTEP-53 DateLit improvements

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/98#discussion_r77750246
  
    --- Diff: types/DatetimeLit.hpp ---
    @@ -51,53 +54,70 @@ struct DateLit {
             + 1   // -
             + 2;  // Day
     
    +  // Years should be between [-kMaxYear, +kMaxYear] inclusive both end.
    +  static constexpr std::int32_t kMaxYear = 99999;
    +  static constexpr std::uint8_t kBitsNeededForDay = 5u;
    +  static constexpr std::uint8_t kBitsNeededForMonth = 4u;
    +
       static DateLit Create(const std::int32_t _year,
                             const std::uint8_t _month,
                             const std::uint8_t _day) {
         DateLit date;
    -    date.year = _year;
    -    date.month = _month;
    -    date.day = _day;
    +    // Normalize year by adding kMaxYear value, because we try to
    +    // encode signed year value into an unsigned integer.
    --- End diff --
    
    The check for leap year plays a role In the implementations for operator + and - for the date type. For instance, to perform an operation like ``-2000-29-01 + 13 months``, we need to know if the resultant year will be leap or not. It seems that this check for leap year will be performed on the normalized year instead of the original year. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---