You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sami Airo (JIRA)" <ji...@apache.org> on 2018/04/30 07:22:00 UTC

[jira] [Commented] (IMPALA-801) Add function or virtual column for file name

    [ https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458370#comment-16458370 ] 

Sami Airo commented on IMPALA-801:
----------------------------------

Just voted for this issue. Our main use case is external source files that are ingested to hdfs. Sometimes ingested filenames contain metadata like a timestamp or source system identifier. With INPUT__FILE__NAME this is easy to process in SQL query.

> Add function or virtual column for file name
> --------------------------------------------
>
>                 Key: IMPALA-801
>                 URL: https://issues.apache.org/jira/browse/IMPALA-801
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Catalog
>    Affects Versions: Impala 1.2.3
>            Reporter: Udai Kiran Potluri
>            Priority: Minor
>              Labels: built-in-function, ramp-up
>
> Hive can list the data files in a table. For eg the following query lists all the data files for the table or partition:
> {noformat}
> select INPUT__FILE__NAME, count(*) from <table_name> where dt='20140210' group by INPUT__FILE__NAME;
> {noformat}
> This has two advantages over the existing "show files" functionality:
> * The output can be used in arbitrary SQL statements.
> * You can see which record came from which file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org