You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/03/30 01:26:25 UTC

[jira] [Created] (PARQUET-576) C++: Simplify RandomAccessSource reads producing InputStream

Wes McKinney created PARQUET-576:
------------------------------------

             Summary: C++: Simplify RandomAccessSource reads producing InputStream
                 Key: PARQUET-576
                 URL: https://issues.apache.org/jira/browse/PARQUET-576
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-cpp
            Reporter: Wes McKinney


Presently, we have code like 

{code}
  int64_t bytes_to_read = col.meta_data.total_compressed_size;
  std::shared_ptr<Buffer> buffer = source_->ReadAt(col_start, bytes_to_read);

  if (buffer->size() < bytes_to_read) {
    throw ParquetException("Unable to read column chunk data");
  }

  std::unique_ptr<InputStream> stream(new InMemoryInputStream(buffer));
{code}

This seems like an leaky detail (some interfaces expect streams, not buffers) that could be encapsulated in the data source class. This would enable us later to work on "lazy" stream instances in a way that does not affect downstream users. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)