You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2015/10/28 17:37:27 UTC

[jira] [Commented] (PARQUET-389) Filter predicates should work with missing columns

    [ https://issues.apache.org/jira/browse/PARQUET-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978702#comment-14978702 ] 

Ryan Blue commented on PARQUET-389:
-----------------------------------

I agree, assuming that by "merged" you mean resolving the requested schema against different file schemas.

> Filter predicates should work with missing columns
> --------------------------------------------------
>
>                 Key: PARQUET-389
>                 URL: https://issues.apache.org/jira/browse/PARQUET-389
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.6.0, 1.7.0, 1.8.0
>            Reporter: Cheng Lian
>
> This issue originates from SPARK-11103, which contains detailed information about how to reproduce it.
> The major problem here is that, filter predicates pushed down assert that columns they touch must exist in the target physical files. But this isn't true in case of schema merging.
> Actually this assertion is unnecessary, because if a column is missing in the filter schema, the column is considered to be filled by nulls, and all the filters should be able to act accordingly. For example, if we push down {{a = 1}} but {{a}} is missing in the underlying physical file, all records in this file should be dropped since {{a}} is always null. On the other hand, if we push down {{a IS NULL}}, all records should be preserved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)