You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2022/06/09 12:31:00 UTC

[jira] [Resolved] (IMPALA-8011) Allow filtering on virtual column for file name

     [ https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy resolved IMPALA-8011.
---------------------------------------
    Fix Version/s: Impala 4.2.0
       Resolution: Fixed

> Allow filtering on virtual column for file name
> -----------------------------------------------
>
>                 Key: IMPALA-8011
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8011
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Peter Ebert
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: built-in-function
>             Fix For: Impala 4.2.0
>
>
> An additional performance enhancement would be the capability to filter on file names using a virtual column.  This would be somewhat like the current optimization of sorting data and skipping files based on parquet metadata, but instead you put something in the file name to indicate it's contents should be filtered.
> For example say you were writing first names and then searching for them, during your writing phase you put the first letter of the first name into your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when doing a query you could filter based on where INPUT__FILE__NAME contains "D" when searching for David and skip reading the file.
> Another use would be if you had a daily partition, and you put the timestamp into the file name, then limit the search to only the last hour even though your partition is daily. This then gives you the ability to sort by another column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org