You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by majetideepak <gi...@git.apache.org> on 2017/04/20 15:17:23 UTC

[GitHub] orc pull request #111: ORC-137: [C++] Expose Indexes as API

GitHub user majetideepak opened a pull request:

    https://github.com/apache/orc/pull/111

    ORC-137: [C++] Expose Indexes as API

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/majetideepak/orc ORC-137

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/111.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #111
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/orc/pull/111


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by wgtmac <gi...@git.apache.org>.
Github user wgtmac commented on a diff in the pull request:

    https://github.com/apache/orc/pull/111#discussion_r114249399
  
    --- Diff: c++/src/Statistics.hh ---
    @@ -605,6 +676,43 @@ namespace orc {
         }
       };
     
    +  class StripeStatisticsImpl: public StripeStatistics {
    +  private:
    +    std::unique_ptr<StatisticsImpl> columnStats;
    +    std::vector<std::vector<std::unique_ptr<const ColumnStatistics> > > rowIndexStats;
    --- End diff --
    
    Similar to my previous comment in ORC-87. If we need to be compatible with older C++ standards, this line may not work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on a diff in the pull request:

    https://github.com/apache/orc/pull/111#discussion_r113782302
  
    --- Diff: c++/include/orc/Statistics.hh ---
    @@ -321,6 +321,32 @@ namespace orc {
          * @return maximum value
          */
         virtual int64_t getMaximum() const = 0;
    +
    +    /**
    +     * check whether column has a lowerBound
    +     * @return true if column has a lowerBound
    +     */
    +    virtual bool hasLowerBound() const = 0;
    --- End diff --
    
    I rebase this patch off ORC-87. That is why the changes appear in the overall patch.
    I just squashed the commits relevant to only this JIRA. There are two commits now. Once is the ORC-87 commit. The second commit is for this JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by wgtmac <gi...@git.apache.org>.
Github user wgtmac commented on a diff in the pull request:

    https://github.com/apache/orc/pull/111#discussion_r114249800
  
    --- Diff: tools/src/FileStatistics.cc ---
    @@ -55,6 +55,11 @@ void printStatistics(const char *filename) {
             std::cout << "--- Column " << k << " ---" << std::endl;
             std::cout << stripeStats->getColumnStatistics(k)->toString()
                       << std::endl;
    +        for(unsigned int r = 0; r < stripeStats->getNumberOfRowIndexStats(k); ++r) {
    --- End diff --
    
    Is it better to make printing row index stats optional as it may be too verbose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #111: ORC-137: [C++] Expose Indexes as API

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/111
  
    Rebases off PR#110. This should be committed after that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on a diff in the pull request:

    https://github.com/apache/orc/pull/111#discussion_r114379730
  
    --- Diff: c++/src/Statistics.hh ---
    @@ -605,6 +676,43 @@ namespace orc {
         }
       };
     
    +  class StripeStatisticsImpl: public StripeStatistics {
    +  private:
    +    std::unique_ptr<StatisticsImpl> columnStats;
    +    std::vector<std::vector<std::unique_ptr<const ColumnStatistics> > > rowIndexStats;
    --- End diff --
    
    Correct! Making it a `shared_ptr`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on a diff in the pull request:

    https://github.com/apache/orc/pull/111#discussion_r114337218
  
    --- Diff: c++/src/Statistics.hh ---
    @@ -605,6 +676,43 @@ namespace orc {
         }
       };
     
    +  class StripeStatisticsImpl: public StripeStatistics {
    +  private:
    +    std::unique_ptr<StatisticsImpl> columnStats;
    +    std::vector<std::vector<std::unique_ptr<const ColumnStatistics> > > rowIndexStats;
    --- End diff --
    
    If using `unique_ptr` is your concern, I believe this is handled by the ORC Adaptor header.
    `./c++/src/Adaptor.hh:86:  #define unique_ptr auto_ptr`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #111: ORC-173: [C++] Expose Indexes as API

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/111
  
    Currently, statistics of all types before WriterVersion HIVE-8732 are ignored.
    Do you want to limit the check only to `string`, `date`, and `decimal` statistics ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by wgtmac <gi...@git.apache.org>.
Github user wgtmac commented on a diff in the pull request:

    https://github.com/apache/orc/pull/111#discussion_r114367188
  
    --- Diff: c++/src/Statistics.hh ---
    @@ -605,6 +676,43 @@ namespace orc {
         }
       };
     
    +  class StripeStatisticsImpl: public StripeStatistics {
    +  private:
    +    std::unique_ptr<StatisticsImpl> columnStats;
    +    std::vector<std::vector<std::unique_ptr<const ColumnStatistics> > > rowIndexStats;
    --- End diff --
    
    auto_ptr is not copy-assignable so we can't create a container of auto_ptr. This is the problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #111: ORC-173: [C++] Expose Indexes as API

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/111
  
    * You've introduced a trailing space in FileStatistics.cc.
    * It would be good if you switched to using getopt_long in FileStatistics, like we did in FileMetadata.cc.
    * Since you added a new example file, you should add it to TestMatch.cc and make an expected file for it.
    * Please keep your column width to 100 characters.
    * You should remove min & max from string, date, and decimal statistics for files before HIVE-8732.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #111: ORC-173: [C++] Expose Indexes as API

Posted by wgtmac <gi...@git.apache.org>.
Github user wgtmac commented on a diff in the pull request:

    https://github.com/apache/orc/pull/111#discussion_r113775214
  
    --- Diff: c++/include/orc/Statistics.hh ---
    @@ -321,6 +321,32 @@ namespace orc {
          * @return maximum value
          */
         virtual int64_t getMaximum() const = 0;
    +
    +    /**
    +     * check whether column has a lowerBound
    +     * @return true if column has a lowerBound
    +     */
    +    virtual bool hasLowerBound() const = 0;
    --- End diff --
    
    I saw these lines of changes are also in ORC-87. Is it a good idea to remove them here to make this change more concise?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---