You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Ruben Laguna (Jira)" <ji...@apache.org> on 2020/10/30 13:01:00 UTC

[jira] [Created] (FLINK-19903) Implement equivalent of Spark's f.input_file_name()

Ruben Laguna created FLINK-19903:
------------------------------------

             Summary: Implement equivalent of Spark's f.input_file_name()
                 Key: FLINK-19903
                 URL: https://issues.apache.org/jira/browse/FLINK-19903
             Project: Flink
          Issue Type: Improvement
          Components: API / Core
            Reporter: Ruben Laguna


Use case: 

I have a dataset where they embedded some information in the filenames
(200k files) and I need to extract that as a new column.

In Spark I could `
.withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
 but I don't see how can I do the same with Flink.

 

Apparently there is [FLIP-107|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]] which would allow SQL connectors and formats to expose metadata. 

 

So it would be great for the Filesystem SQL connector to expose the path. 

Ideally for me the path could be exposed via a function that read the metadata. So I could write  something akin to `SELECT input_file_name(),* FROM table1`

 

 

[1]: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]

[2]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)