You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Xianjin YE (JIRA)" <ji...@apache.org> on 2018/03/30 06:30:00 UTC

[jira] [Commented] (ARROW-2360) Add set_chunksize for RecordBatchReader in arrow/record_batch.h

    [ https://issues.apache.org/jira/browse/ARROW-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420210#comment-16420210 ] 

Xianjin YE commented on ARROW-2360:
-----------------------------------

Some options:
 1. add chunksize related API in the RecordBatchReader base class
{code}
class ARROW_EXPORT RecordBatchReader {
 public:
  virtual ~RecordBatchReader();
  // other impls are omitted

  /// Indicate whether RecordBatchReader supports chunking or not, if
  /// not supported, set_chunksize should have no effect
  virtual bool isChunkingSupported() const = 0;

  /// Set the maximum chunksize for the following RecordBatch to be read
  virtual void set_chunksize(int64_t chunksize) {}
  
  /// Not determined yet, maybe handy when user needs to know the current chunksize
  virtual int64_t get_chunksize() { return -1; }

};

// RecordBatchStreamReader
class ARROW_EXPORT RecordBatchStreamReader: public RecordBatchReader {
 public:
  // other impls are omitted

  bool isChunkingSupported()  { return false;}
}

// TableBatchReader
class ARROW_EXPORT TableBatchReader : public RecordBatchReader {
 public:
  ~TableBatchReader() override;
  /// other impls are omitted
  bool isChunkingSupported() { return true;}

  void set_chunksize(int64_t chunksize);
};

{code}
2. Add a new base class called {{ChunkableRecordBatchReader}}. And {{TableBatchReader}} should be subclass of {{ChunkableRecordBatchReader}}
 In parquet-reader, the {{GetRecorBatchReader}} would return a {{shared_ptr<ChunkableRecordBatchReader>}}

[~wesmckinn] which one do you prefer or do you have other options?

> Add set_chunksize for RecordBatchReader in arrow/record_batch.h
> ---------------------------------------------------------------
>
>                 Key: ARROW-2360
>                 URL: https://issues.apache.org/jira/browse/ARROW-2360
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Xianjin YE
>            Priority: Major
>
> As discussed in [https://github.com/apache/parquet-cpp/pull/445,] 
> Maybe it's better to expose chunksize related API in RecordBatchReader.
>  
> However RecordBatchStreamReader doesn't conforms to this requirement. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)