You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by John Smith <ja...@gmail.com> on 2020/07/31 15:53:14 UTC

Is there a way to get file "metadata" as part of stream?

Hi, so reading a CSV file using env.readFile() with RowCsvInputFormat.

Is there a way to get the filename as part of the row stream?

The file contains a unique identifier to tag the rows with.

Re: Is there a way to get file "metadata" as part of stream?

Posted by Till Rohrmann <tr...@apache.org>.
Hi John,

out of the box, Flink does not provide this functionality. However, you
might be able to write your own CsvInputFormat which overrides fillRecord
so that it generates a CSV record where the first field contains the
filename. You can obtain the filename from the field currentSplit. I
haven't tried it out myself, though.

Cheers,
Till

On Fri, Jul 31, 2020 at 5:54 PM John Smith <ja...@gmail.com> wrote:

> Hi, so reading a CSV file using env.readFile() with RowCsvInputFormat.
>
> Is there a way to get the filename as part of the row stream?
>
> The file contains a unique identifier to tag the rows with.
>