You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Xianjin YE (JIRA)" <ji...@apache.org> on 2018/03/30 06:30:00 UTC
[jira] [Commented] (ARROW-2360) Add set_chunksize for
RecordBatchReader in arrow/record_batch.h
[ https://issues.apache.org/jira/browse/ARROW-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420210#comment-16420210 ]
Xianjin YE commented on ARROW-2360:
-----------------------------------
Some options:
1. add chunksize related API in the RecordBatchReader base class
{code}
class ARROW_EXPORT RecordBatchReader {
public:
virtual ~RecordBatchReader();
// other impls are omitted
/// Indicate whether RecordBatchReader supports chunking or not, if
/// not supported, set_chunksize should have no effect
virtual bool isChunkingSupported() const = 0;
/// Set the maximum chunksize for the following RecordBatch to be read
virtual void set_chunksize(int64_t chunksize) {}
/// Not determined yet, maybe handy when user needs to know the current chunksize
virtual int64_t get_chunksize() { return -1; }
};
// RecordBatchStreamReader
class ARROW_EXPORT RecordBatchStreamReader: public RecordBatchReader {
public:
// other impls are omitted
bool isChunkingSupported() { return false;}
}
// TableBatchReader
class ARROW_EXPORT TableBatchReader : public RecordBatchReader {
public:
~TableBatchReader() override;
/// other impls are omitted
bool isChunkingSupported() { return true;}
void set_chunksize(int64_t chunksize);
};
{code}
2. Add a new base class called {{ChunkableRecordBatchReader}}. And {{TableBatchReader}} should be subclass of {{ChunkableRecordBatchReader}}
In parquet-reader, the {{GetRecorBatchReader}} would return a {{shared_ptr<ChunkableRecordBatchReader>}}
[~wesmckinn] which one do you prefer or do you have other options?
> Add set_chunksize for RecordBatchReader in arrow/record_batch.h
> ---------------------------------------------------------------
>
> Key: ARROW-2360
> URL: https://issues.apache.org/jira/browse/ARROW-2360
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Xianjin YE
> Priority: Major
>
> As discussed in [https://github.com/apache/parquet-cpp/pull/445,]
> Maybe it's better to expose chunksize related API in RecordBatchReader.
>
> However RecordBatchStreamReader doesn't conforms to this requirement.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)