You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by majetideepak <gi...@git.apache.org> on 2016/06/20 03:37:38 UTC

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

GitHub user majetideepak opened a pull request:

    https://github.com/apache/orc/pull/41

    ORC-58: Move code for reading rows from Reader to RowReader

    This PR divides the current Reader class into Reader class and RowReader class
    Reader class reads the footer and metadata from an ORC file.
    RowReader class reads the rows from the file.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/majetideepak/orc RowReader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/orc/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #41
    
----
commit ed02d7ab668c1a61f8260e0efddf06c617658753
Author: Deepak Majeti <ma...@gmail.com>
Date:   2016-06-05T21:51:45Z

    Split curent Reader into RowReader and Reader

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/41
  
    @omalley, please let me know your feedback. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r71377966
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -645,7 +645,72 @@ namespace orc {
       };
     
       /**
    -   * The interface for reading ORC files.
    +   * The interface for reading rows in ORC files.
    +   * This is an an abstract class that will subclassed as necessary.
    +   */
    +  class RowReader {
    +  public:
    +    virtual ~RowReader();
    +    /**
    +     * Get the selected type of the rows in the file. The file's row type
    +     * is projected down to just the selected columns. Thus, if the file's
    +     * type is struct<col0:int,col1:double,col2:string> and the selected
    +     * columns are "col0,col2" the selected type would be
    +     * struct<col0:int,col2:string>.
    +     * @return the root type
    +     */
    +    virtual const Type& getSelectedType() const = 0;
    +
    +    /**
    +     * Get the selected columns of the file.
    +     */
    +    virtual const std::vector<bool> getSelectedColumns() const = 0;
    +
    --- End diff --
    
    You have some trailing spaces. You can probably configure your IDE to automatically trim them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r71443174
  
    --- Diff: c++/src/Reader.cc ---
    @@ -1062,37 +1062,36 @@ namespace orc {
         // PASS
       }
     
    +  RowReader::~RowReader() {
    +    // PASS
    +  }
    +
       static const uint64_t DIRECTORY_SIZE_GUESS = 16 * 1024;
     
    -  class ReaderImpl : public Reader {
    +  class ReaderImpl;
    +
    +  class RowReaderImpl : public RowReader {
       private:
         const Timezone& localTimezone;
     
         // inputs
    -    std::unique_ptr<InputStream> stream;
    -    ReaderOptions options;
    -    const uint64_t fileLength;
    -    const uint64_t postscriptLength;
         std::vector<bool> selectedColumns;
    -
    +    std::shared_ptr<InputStream> stream;
    --- End diff --
    
    I'd suggest creating a single class that has all of the state for the ReaderImpl. Then it can have the postscript, InputStream, etc. and be shared between the ReaderImpl and the set of RowReaderImpls that share the same ReaderImpl.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r71029025
  
    --- Diff: c++/src/Reader.cc ---
    @@ -1108,13 +1107,75 @@ namespace orc {
         // internal methods
         proto::StripeFooter getStripeFooter(const proto::StripeInformation& info);
         void startNextStripe();
    -    void checkOrcVersion();
         void selectType(const Type& type);
    -    void readMetadata() const;
         void updateSelected(const std::list<uint64_t>& fieldIds);
         void updateSelected(const std::list<std::string>& fieldNames);
     
       public:
    +   /**
    +    * Constructor that lets the user specify additional options.
    +    * @param filereader the object to read from
    +    * @param options options for reading
    +    */
    +    RowReaderImpl(const ReaderImpl* filereader,
    --- End diff --
    
    We will have to extend the `Reader` interface with a notion of `metadata`, `ReaderOptions` etc. to achieve this. I am not sure what the right design choice is here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by swalkaus <gi...@git.apache.org>.
Github user swalkaus commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r71004296
  
    --- Diff: c++/src/Reader.cc ---
    @@ -1108,13 +1107,75 @@ namespace orc {
         // internal methods
         proto::StripeFooter getStripeFooter(const proto::StripeInformation& info);
         void startNextStripe();
    -    void checkOrcVersion();
         void selectType(const Type& type);
    -    void readMetadata() const;
         void updateSelected(const std::list<uint64_t>& fieldIds);
         void updateSelected(const std::list<std::string>& fieldNames);
     
       public:
    +   /**
    +    * Constructor that lets the user specify additional options.
    +    * @param filereader the object to read from
    +    * @param options options for reading
    +    */
    +    RowReaderImpl(const ReaderImpl* filereader,
    +           std::shared_ptr<ReaderOptions> options);
    +
    +    uint64_t getMemoryUse(int stripeIx = -1) override;
    +
    +    const std::vector<bool> getSelectedColumns() const override;
    +
    +    const Type& getSelectedType() const override;
    +
    +    std::unique_ptr<ColumnVectorBatch> createRowBatch(uint64_t size
    +                                                      ) const override;
    +
    +    bool next(ColumnVectorBatch& data) override;
    +
    +    const ReaderOptions& getReaderOptions() const;
    +
    +    CompressionKind getCompression() const;
    +
    +    uint64_t getCompressionSize() const;
    +
    +    uint64_t getRowNumber() const override;
    +
    +    void seekToRow(uint64_t rowNumber) override;
    +
    +    MemoryPool* getMemoryPool() const ;
    +
    +  };
    +
    +  class ReaderImpl : public Reader {
    +    friend class RowReaderImpl;
    --- End diff --
    
    Avoid using friend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by swalkaus <gi...@git.apache.org>.
Github user swalkaus commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r71004575
  
    --- Diff: c++/src/Reader.cc ---
    @@ -1108,13 +1107,75 @@ namespace orc {
         // internal methods
         proto::StripeFooter getStripeFooter(const proto::StripeInformation& info);
         void startNextStripe();
    -    void checkOrcVersion();
         void selectType(const Type& type);
    -    void readMetadata() const;
         void updateSelected(const std::list<uint64_t>& fieldIds);
         void updateSelected(const std::list<std::string>& fieldNames);
     
       public:
    +   /**
    +    * Constructor that lets the user specify additional options.
    +    * @param filereader the object to read from
    +    * @param options options for reading
    +    */
    +    RowReaderImpl(const ReaderImpl* filereader,
    --- End diff --
    
    Should filereader be a "Reader" or "ReaderImpl"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by swalkaus <gi...@git.apache.org>.
Github user swalkaus commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r70968789
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -645,7 +645,72 @@ namespace orc {
       };
     
       /**
    -   * The interface for reading ORC files.
    +   * The interface for reading rows in ORC files.
    +   * This is an an abstract class that will subclassed as necessary.
    +   */
    +  class RowReader {
    +  public:
    +    virtual ~RowReader();
    +    /**
    +     * Get the selected type of the rows in the file. The file's row type
    +     * is projected down to just the selected columns. Thus, if the file's
    +     * type is struct<col0:int,col1:double,col2:string> and the selected
    +     * columns are "col0,col2" the selected type would be
    +     * struct<col0:int,col2:string>.
    +     * @return the root type
    +     */
    +    virtual const Type& getSelectedType() const = 0;
    +
    +    /**
    +     * Get the selected columns of the file.
    +     */
    +    virtual const std::vector<bool> getSelectedColumns() const = 0;
    +
    +    /**
    +     * Create a row batch for reading the selected columns of this file.
    +     * @param size the number of rows to read
    +     * @return a new ColumnVectorBatch to read into
    +     */
    +    virtual ORC_UNIQUE_PTR<ColumnVectorBatch> createRowBatch(uint64_t size
    +                                                             ) const = 0;
    +
    +    /**
    +     * Read the next row batch from the current position.
    +     * Caller must look at numElements in the row batch to determine how
    +     * many rows were read.
    +     * @param data the row batch to read into.
    +     * @return true if a non-zero number of rows were read or false if the
    +     *   end of the file was reached.
    +     */
    +    virtual bool next(ColumnVectorBatch& data) = 0;
    +
    +    /**
    +     * Get the row number of the first row in the previously read batch.
    +     * @return the row number of the previous batch.
    +     */
    +    virtual uint64_t getRowNumber() const = 0;
    +
    +    /**
    +     * Seek to a given row.
    +     * @param rowNumber the next row the reader should return
    +     */
    +    virtual void seekToRow(uint64_t rowNumber) = 0;
    +  
    +    /**
    +     * Estimate an upper bound on heap memory allocation by the Reader
    +     * based on the information in the file footer.
    +     * The bound is less tight if only few columns are read or compression is
    +     * used.
    +     * @param stripeIx index of the stripe to be read (if not specified,
    +     *        all stripes are considered).
    +     * @return upper bound on memory use
    +     */
    +    virtual uint64_t getMemoryUse(int stripeIx=-1) = 0;
    + 
    +  };
    +
    +  /**
    +   * The interface for reading footer in ORC files.
    --- End diff --
    
    "The interface for reading ORC file meta-data."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/41
  
    Included review comments.
    This patch is huge because I split the monolithic Reader.cc into various files
    - Reader.hh/Reader.cc: ReaderImpl and RowReaderImpl
    - Statistics.hh/Statistics.cc: ColumnStatisticsImpl, StatisticsImpl, Type*StatisticsImpl
    - StripeStream.hh/StripeStream.cc: StripeStreamsImpl, StreamInformationImpl, StripeInformationImpl
    - Options.hh: ReaderOptions, RowReaderOptions
    
    @omalley any feedback ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/41
  
    @omalley 
    https://issues.apache.org/jira/browse/ORC-127
    https://github.com/apache/orc/pull/77
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/41
  
    I think we should have FileContents be a shared ptr so that if the user deallocates the Reader, the RowReader won't reach into deallocated memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/41
  
    Deepak, 
       We got a notice of a new problem from Coverity. Can you look at it?
    
    
    ** CID 173749:  Uninitialized members  (UNINIT_CTOR)
    /c++/src/Reader.cc: 259 in orc::RowReaderImpl::RowReaderImpl(std::shared_ptr<orc::FileContents>, const orc::RowReaderOptions &)()
    
    
    ________________________________________________________________________________________________________
    *** CID 173749:  Uninitialized members  (UNINIT_CTOR)
    /c++/src/Reader.cc: 259 in orc::RowReaderImpl::RowReaderImpl(std::shared_ptr<orc::FileContents>, const orc::RowReaderOptions &)()
    253         } else {
    254           previousRow = firstRowOfStripe[firstStripe]-1;
    255         }
    256
    257         ColumnSelector column_selector(contents.get());
    258         column_selector.updateSelected(selectedColumns, options);
    >>>     CID 173749:  Uninitialized members  (UNINIT_CTOR)
    >>>     Non-static class member "rowsInCurrentStripe" is not initialized in this constructor nor in any functions that it calls.
    259       }
    260
    261       const RowReaderOptions& RowReaderImpl::getRowReaderOptions() const {
    262         return options;
    263       }
    264
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by swalkaus <gi...@git.apache.org>.
Github user swalkaus commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r70967406
  
    --- Diff: tools/src/FileContents.cc ---
    @@ -27,8 +27,10 @@
     #include <string>
     
     void printContents(const char* filename, const orc::ReaderOptions opts) {
    -  std::unique_ptr<orc::Reader> reader;
    -  reader = orc::createReader(orc::readLocalFile(std::string(filename)), opts);
    +  std::unique_ptr<orc::Reader> fileReader;
    --- End diff --
    
    Pendantic: use reader and rowReader consistently (notice the camelCase).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/41
  
    The ReaderOptions needs to split in half too:
    
    * ReaderOptions
      * setTailLocation
      * setErrorStream
      * setSerializedFooter
    
    * RowReaderOptions
      * include
      * range
      * throwOnHive11DecimalOverflow
      * forcedScaleOnHive11Decimal


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by swalkaus <gi...@git.apache.org>.
Github user swalkaus commented on the issue:

    https://github.com/apache/orc/pull/41
  
    I might rename "Reader" something like "FileTailReader" but, in general, the refactoring makes sense. The schema can be read and columns metadata known before column selection is decided.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/41
  
    Replaced with REDUNDANT_MOVE. valgrind does not show any memory leaks due to this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/orc/pull/41


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r71379015
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -782,85 +847,42 @@ namespace orc {
         getColumnStatistics(uint32_t columnId) const = 0;
     
         /**
    -     * Get the type of the rows in the file. The top level is typically a
    -     * struct.
    -     * @return the root type
    -     */
    -    virtual const Type& getType() const = 0;
    -
    -    /**
    -     * Get the selected type of the rows in the file. The file's row type
    -     * is projected down to just the selected columns. Thus, if the file's
    -     * type is struct<col0:int,col1:double,col2:string> and the selected
    -     * columns are "col0,col2" the selected type would be
    -     * struct<col0:int,col2:string>.
    -     * @return the root type
    -     */
    -    virtual const Type& getSelectedType() const = 0;
    -
    -    /**
    -     * Get the selected columns of the file.
    +     * check file has correct column statistics
          */
    -    virtual const std::vector<bool> getSelectedColumns() const = 0;
    +    virtual bool hasCorrectStatistics() const = 0;
     
         /**
    -     * Create a row batch for reading the selected columns of this file.
    -     * @param size the number of rows to read
    -     * @return a new ColumnVectorBatch to read into
    +     * Get the serialized file tail.
    +     * Usefull if another reader of the same file wants to avoid re-reading
    +     * the file tail. See ReaderOptions.setSerializedFileTail().
    +     * @return a string of bytes with the file tail
          */
    -    virtual ORC_UNIQUE_PTR<ColumnVectorBatch> createRowBatch(uint64_t size
    -                                                             ) const = 0;
    +    virtual std::string getSerializedFileTail() const = 0;
     
         /**
    -     * Read the next row batch from the current position.
    -     * Caller must look at numElements in the row batch to determine how
    -     * many rows were read.
    -     * @param data the row batch to read into.
    -     * @return true if a non-zero number of rows were read or false if the
    -     *   end of the file was reached.
    +     * Get the type of the rows in the file. The top level is typically a
    +     * struct.
    +     * @return the root type
          */
    -    virtual bool next(ColumnVectorBatch& data) = 0;
    +    virtual const Type& getType() const = 0;
     
         /**
    -     * Get the row number of the first row in the previously read batch.
    -     * @return the row number of the previous batch.
    +     * @return a RowReader to read the rows
          */
    -    virtual uint64_t getRowNumber() const = 0;
    +    virtual ORC_UNIQUE_PTR<RowReader> getRowReader() const = 0;
     
         /**
    -     * Seek to a given row.
    -     * @param rowNumber the next row the reader should return
    +     * @param include update with new columns
    +     * @return a RowReader to read the rows
          */
    -    virtual void seekToRow(uint64_t rowNumber) = 0;
    +    virtual ORC_UNIQUE_PTR<RowReader>
    +    getRowReader(const std::list<uint64_t>& include) const = 0;
    --- End diff --
    
    Let's go ahead and make a RowReaderOptions class and pass that in here. It is almost guaranteed that the include vector will not be the only option that we want to pass in. One of the options will be to specify the include vector.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc pull request #41: ORC-58: Move code for reading rows from Reader to RowR...

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/41#discussion_r71378147
  
    --- Diff: c++/include/orc/Reader.hh ---
    @@ -645,7 +645,72 @@ namespace orc {
       };
     
       /**
    -   * The interface for reading ORC files.
    +   * The interface for reading rows in ORC files.
    +   * This is an an abstract class that will subclassed as necessary.
    +   */
    +  class RowReader {
    +  public:
    +    virtual ~RowReader();
    +    /**
    +     * Get the selected type of the rows in the file. The file's row type
    +     * is projected down to just the selected columns. Thus, if the file's
    +     * type is struct<col0:int,col1:double,col2:string> and the selected
    +     * columns are "col0,col2" the selected type would be
    +     * struct<col0:int,col2:string>.
    +     * @return the root type
    +     */
    +    virtual const Type& getSelectedType() const = 0;
    +
    +    /**
    +     * Get the selected columns of the file.
    +     */
    +    virtual const std::vector<bool> getSelectedColumns() const = 0;
    +
    +    /**
    +     * Create a row batch for reading the selected columns of this file.
    +     * @param size the number of rows to read
    +     * @return a new ColumnVectorBatch to read into
    +     */
    +    virtual ORC_UNIQUE_PTR<ColumnVectorBatch> createRowBatch(uint64_t size
    +                                                             ) const = 0;
    +
    +    /**
    +     * Read the next row batch from the current position.
    +     * Caller must look at numElements in the row batch to determine how
    +     * many rows were read.
    +     * @param data the row batch to read into.
    +     * @return true if a non-zero number of rows were read or false if the
    +     *   end of the file was reached.
    +     */
    +    virtual bool next(ColumnVectorBatch& data) = 0;
    +
    +    /**
    +     * Get the row number of the first row in the previously read batch.
    +     * @return the row number of the previous batch.
    +     */
    +    virtual uint64_t getRowNumber() const = 0;
    +
    +    /**
    +     * Seek to a given row.
    +     * @param rowNumber the next row the reader should return
    +     */
    +    virtual void seekToRow(uint64_t rowNumber) = 0;
    +  
    +    /**
    +     * Estimate an upper bound on heap memory allocation by the Reader
    +     * based on the information in the file footer.
    +     * The bound is less tight if only few columns are read or compression is
    +     * used.
    +     * @param stripeIx index of the stripe to be read (if not specified,
    +     *        all stripes are considered).
    +     * @return upper bound on memory use
    +     */
    +    virtual uint64_t getMemoryUse(int stripeIx=-1) = 0;
    --- End diff --
    
    This should stay in the reader.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by majetideepak <gi...@git.apache.org>.
Github user majetideepak commented on the issue:

    https://github.com/apache/orc/pull/41
  
    I updated the patch to use a `shared_ptr` for `FileContents`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] orc issue #41: ORC-58: Move code for reading rows from Reader to RowReader

Posted by omalley <gi...@git.apache.org>.
Github user omalley commented on the issue:

    https://github.com/apache/orc/pull/41
  
    This is looking good. To compile on MacOS, I needed to replace some of the std::move with REDUNDANT_MOVE. See omalley:pr/41 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---