You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/03/24 09:50:00 UTC

[GitHub] [arrow] mapleFU opened a new issue, #34722: [C++][Parquet] Minor: Declare lifetime of `shared_ptr` on PageReader

mapleFU opened a new issue, #34722:
URL: https://github.com/apache/arrow/issues/34722

   ### Describe the enhancement requested
   
   In C++ Parquet, `SerializedPageReader` holds a `decryption_buffer_` and `decompression_buffer_`, and when `NextPage` called, it will reuse that buffer. So, `Page`'s  buffer data is bounded on `SerializedPageReader::NextPage`, when NextPage is called, it's data might be reset.
   
   I have two problems here:
   1. If it's by design, why don't we cache raw data in SerializePageReader?
   2. Should we note these in comments?
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 closed issue #34722: [C++][Parquet] Minor: Declare lifetime of `shared_ptr` on PageReader

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 closed issue #34722: [C++][Parquet] Minor: Declare lifetime of `shared_ptr<Page>` on PageReader
URL: https://github.com/apache/arrow/issues/34722


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #34722: [C++][Parquet] Minor: Declare lifetime of `shared_ptr` on PageReader

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #34722:
URL: https://github.com/apache/arrow/issues/34722#issuecomment-1482995057

   The interface:
   
   ```c++
   // Abstract page iterator interface. This way, we can feed column pages to the
   // ColumnReader through whatever mechanism we choose
   class PARQUET_EXPORT PageReader {
     using DataPageFilter = std::function<bool(const DataPageStats&)>;
   
    public:
     virtual ~PageReader() = default;
   
     // @returns: shared_ptr<Page>(nullptr) on EOS, std::shared_ptr<Page>
     // containing new Page otherwise
     virtual std::shared_ptr<Page> NextPage() = 0;
   ```
   
   The actual:
   
   ```c++
   // This subclass delimits pages appearing in a serialized stream, each preceded
   // by a serialized Thrift format::PageHeader indicating the type of each page
   // and the page metadata.
   class SerializedPageReader : public PageReader {
    public:
     SerializedPageReader(std::shared_ptr<ArrowInputStream> stream, int64_t total_num_values,
                          Compression::type codec, const ReaderProperties& properties,
                          const CryptoContext* crypto_ctx, bool always_compressed)
         : properties_(properties),
           stream_(std::move(stream)),
           decompression_buffer_(AllocateBuffer(properties_.memory_pool(), 0)),
           decryption_buffer_(AllocateBuffer(properties_.memory_pool(), 0)) {}
   ```
   
   It (implicitly) depend on `SerializedPageReader`'s buffer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #34722: [C++][Parquet] Minor: Declare lifetime of `shared_ptr` on PageReader

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #34722:
URL: https://github.com/apache/arrow/issues/34722#issuecomment-1482527201

   @pitrou @wjones127 mind take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on issue #34722: [C++][Parquet] Minor: Declare lifetime of `shared_ptr` on PageReader

Posted by "cyb70289 (via GitHub)" <gi...@apache.org>.
cyb70289 commented on issue #34722:
URL: https://github.com/apache/arrow/issues/34722#issuecomment-1482785092

   It might be easier to review if you can paster the code link here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org