You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/05/22 10:35:40 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #4252: Skip Computing Nulls for Non-Nullable Parquet Columns

tustvold opened a new issue, #4252:
URL: https://github.com/apache/arrow-rs/issues/4252

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   -->
   
   Currently `DefinitionLevelDecoder` always computes a null mask if any parent of a column is nullable.
   
   This is not only wasteful, but is also dubiously correct as it can result in StructArray that would fail the nullability checks:
   
   Consider the case of
   
   ```
   optional group l1 {
       optional group l2 {
           required INT32 leaf;
       }
   }
   ```
   
   PrimitiveArrayDecoder will decode leaf with a null buffer, `l2` will then be decoded as a `StructArray` without a null buffer, with leaf as a child. This is technically ill-formed as the Field for leaf will state it isn't nullable, however, leaf has a null buffer
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   `DefinitionLevelDecoder` should skip decoding a null buffer for non-nullable columns, some additional care may be necessary for dictionaries to ensure that the dictionary offsets are valid in isolation.
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org