You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2024/04/02 13:02:40 UTC

[I] [C++][Parquet] Revisit is_sorted flag in Parquet DictionaryPageHeader [arrow]

mapleFU opened a new issue, #40948:
URL: https://github.com/apache/arrow/issues/40948

   ### Describe the enhancement requested
   
   Parquet-format has dictionary `is_sorted` flag. However, seems no impl enables this. `is_sorted` is useful when input data is ordered and building filter on dictionary. It could make "dictionary filter" fast(without building a hashtable).
   
   This requires set `is_sorted` flag during writing dictionary page. We can fast checking it if the dict is sorted. Like:
   
   ```
   WriteBatchToDict(RecordBatch) {
     if (is_sorted) {
       is_sorted = checkDict(RecordBatch)
     }
     // do write
   }
   ```
   
   [1] https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L613-L614
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org