You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sami Airo (JIRA)" <ji...@apache.org> on 2018/04/30 07:22:00 UTC
[jira] [Commented] (IMPALA-801) Add function or virtual column for
file name
[ https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458370#comment-16458370 ]
Sami Airo commented on IMPALA-801:
----------------------------------
Just voted for this issue. Our main use case is external source files that are ingested to hdfs. Sometimes ingested filenames contain metadata like a timestamp or source system identifier. With INPUT__FILE__NAME this is easy to process in SQL query.
> Add function or virtual column for file name
> --------------------------------------------
>
> Key: IMPALA-801
> URL: https://issues.apache.org/jira/browse/IMPALA-801
> Project: IMPALA
> Issue Type: New Feature
> Components: Catalog
> Affects Versions: Impala 1.2.3
> Reporter: Udai Kiran Potluri
> Priority: Minor
> Labels: built-in-function, ramp-up
>
> Hive can list the data files in a table. For eg the following query lists all the data files for the table or partition:
> {noformat}
> select INPUT__FILE__NAME, count(*) from <table_name> where dt='20140210' group by INPUT__FILE__NAME;
> {noformat}
> This has two advantages over the existing "show files" functionality:
> * The output can be used in arbitrary SQL statements.
> * You can see which record came from which file.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org