You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by xndai <gi...@git.apache.org> on 2017/08/10 00:11:21 UTC

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

GitHub user xndai opened a pull request:

    https://github.com/apache/orc/pull/151

    ORC-226 Support getWriterId in c++ reader interface

    Add new interface for reader to retrieve writer ID.
    
    Change-Id: I87939e448aa5eab1bc7ed728404ddbf41334d809

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xndai/orc dev_writerid

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/151.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #151
    
----
commit 0269e8a7df72b34f0af83f668f21ac627e00f6d3
Author: Xiening.Dai <xn...@live.com>
Date:   2017-08-10T00:09:43Z

    ORC-226 Support getWriterId in c++ reader interface
    
    Add new interface for reader to retrieve writer ID.
    
    Change-Id: I87939e448aa5eab1bc7ed728404ddbf41334d809

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r133330343
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    Then we need a logger in Orc library which we don't have now. What's the problem of returning actual value instead of ORC_FUTURE?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r132734446
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    No, it won't. Dain from the Presto team was the one that suggested adding the field.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r133330120
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    Maybe we should print an error message with the actual id number if it isn't recognized. That would let you debug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on the issue:

    https://github.com/apache/orc/pull/151
  
    @majetideepak how do I do that? :)


---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r133008869
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    Ok, I see. But I am still not sure about returning ORC_FUTURE. The caller may want to obtain the exact writer ID for debugging purpose. It can choose to ignore if it wants. We just give them one more option. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r138235902
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,18 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * @return UNKNOWN_WRITER if the writer ID is undefined
    +     */
    +    virtual WriterId getWriterId() const = 0;
    +
    +    /**
    +     * Get the writer id value when getWriterId() returns an unknown writer.
    +     * @return the integer value of the writer ID.
    +     */
    +    virtual int getUnknownWriterIdValue() const = 0;
    --- End diff --
    
    Sounds good. I will update it. Thanks.


---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/151
  
    Ok, this is generally good. A couple of points that I'll fix as part of committing:
    
    * the API should use specific types (eg. uint32_t instead of int)
    * since the enums are for serialization, I think it will be clearer if we assign explicit values for them
    * if the footer doesn't have a writerId, it is ORC_JAVA_WRITER and not an exception



---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r132571311
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    If it isn't there, it means ORC Java. Let's make an enum that matches this:
    
        enum WriterId { ORC_Java = 0, ORC_CPP = 1, ORC_Presto = 2, ORC_FUTURE = INT_MAX }
    
    Any unknown code should return ORC_FUTURE.
    
       virtual WriterId getWriterId() const = 0;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by ajayyadava <gi...@git.apache.org>.
Github user ajayyadava commented on the issue:

    https://github.com/apache/orc/pull/151
  
    @xndai You can use the following command to squash your last K commits into 1:
    git reset --soft HEAD~K && git commit
    e.g. to squash last 2 commits into one, replace K with 2
    git reset --soft HEAD~2 && git commit
    
    I will advise that you create a separate branch from the branch you want to commit and try it on that first.



---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/orc/pull/151


---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r132734767
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    Re-reading your question, I should clarify that the Presto writer hasn't been released yet. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on the issue:

    https://github.com/apache/orc/pull/151
  
    @omalley please take another look. thanks.


---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on the issue:

    https://github.com/apache/orc/pull/151
  
    @majetideepak pls take another look. Thx!


---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r135922340
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    Actually, we do have the error stream, but we don't need to use it. I'd prefer not to use side effects in the parameters, because it is easy to misread in the code.
    
    For the code and most users, the important part is which writer it is if known. Only diagnostic tools will care about the integer value of unknown ones. How about:
    
        enum WriterId { ORC_JAVA_WRITER, ORC_CPP_WRITER, PRESTO_WRITER, UNKNOWN_WRITER}
    
        class Reader {
        
           WriterId getWriterId();
           /**
             * For unknown writer ids, get the value.
             */
           int getUnknownWriterIdValue();
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r134897076
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    @omalley can we close on this by just returning the actual code? Or you have other concerns? Thx.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r136697977
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    ok, that sounds good. I will update it in my next change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on the issue:

    https://github.com/apache/orc/pull/151
  
    Can someone help review and close on this? Thx. @omalley @majetideepak 


---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on the issue:

    https://github.com/apache/orc/pull/151
  
    Squash commit. Thanks @ajayyadava @majetideepak 


---

[GitHub] orc issue #151: ORC-226 Support getWriterId in c++ reader interface

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/151
  
    @xndai  can you please squash your commits? 


---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r138157541
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,18 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * @return UNKNOWN_WRITER if the writer ID is undefined
    +     */
    +    virtual WriterId getWriterId() const = 0;
    +
    +    /**
    +     * Get the writer id value when getWriterId() returns an unknown writer.
    +     * @return the integer value of the writer ID.
    +     */
    +    virtual int getUnknownWriterIdValue() const = 0;
    --- End diff --
    
    Should this be `getWriteIdValue`? I think the implementation returns both known and unknown ids?


---

[GitHub] orc pull request #151: ORC-226 Support getWriterId in c++ reader interface

Posted by xndai <gi...@git.apache.org>.
Github user xndai commented on a diff in the pull request:

    https://github.com/apache/orc/pull/151#discussion_r132589551
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -288,6 +288,17 @@ namespace orc {
         virtual uint64_t getCompressionSize() const = 0;
     
         /**
    +     * Get ID of writer that generated the file.
    +     * Current availiable Orc writers:
    +     * 0 = ORC Java
    +     * 1 = ORC C++
    +     * 2 = Presto
    +     * @param id out parameter for writer id
    +     * @return true if writer id is availiable, false if otherwise
    +     */
    +    virtual bool getWriterId(uint32_t & id) const = 0;
    --- End diff --
    
    Will there be any files generated by old Presto writer without an ID? Also if we choose to return ORC_FUTURE, there's no way to determine the actual writer ID. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---