You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/03/25 15:37:00 UTC

[jira] [Closed] (HUDI-3594) Support standard Spark functions in Filter Exprs in Data Skipping

     [ https://issues.apache.org/jira/browse/HUDI-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan closed HUDI-3594.
-------------------------------------
    Resolution: Fixed

> Support standard Spark functions in Filter Exprs in Data Skipping
> -----------------------------------------------------------------
>
>                 Key: HUDI-3594
>                 URL: https://issues.apache.org/jira/browse/HUDI-3594
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> As part of this effort we're planning to (at the very least) support a suite of standard Spark functions when evaluating Data Filtering expressions w/in Data Skipping flow, for ex: when user is issuing a following query 
>  
> {code:java}
> SELECT ... WHERE date_format(ts, 'dd-mm-yyyy') > '01-01-2022'
> {code}
> We're able to relate such query to our Column Stats Index appropriately, therefore being able to do Data Skipping not only on the "raw" columns, but also upon simple derivative expressions on top of them (like standard function calls){*}{{*}}
>  
> *Important to note here, is that only transformations that _preserve the ordering of the source column_ can be applied. Transformations not preserving the ordering will render Column Stats index practically irrelevant (since no assumption could be made that values in the column derived by such transformations are ordered)*



--
This message was sent by Atlassian Jira
(v8.20.1#820001)