You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dan Hill <qu...@gmail.com> on 2021/02/07 23:40:07 UTC

UUID in part files

Hi.

*Context*
I'm migrating my Flink SQL job to DataStream.  When switching to
StreamingFileSink, I noticed that the part files now do not have a uuid in
them.  "part-0-0" vs "part-{uuid string}-0-0".  This is easy to add with
OutputFileConfig.

*Question*
Is there a reason why the base OutputFileConfig doesn't add the uuid
automatically?  Is this just a legacy issue?  Or do most people not have
the uuid in the file outputs?

Re: UUID in part files

Posted by Yun Gao <yu...@aliyun.com>.
Hi Dan

The SQL add the uuid by default is for the case that users want execute
multiple bounded sql and append to the same directory (hive table), thus
a uuid is attached to avoid overriding the previous output.

The datastream could be viewed as providing the low-level api and
thus it does not add the uuid automatically. And as you have pointed out,
by using OutputFileConfig users could also implement the functionality.

Best,
 Yun


 ------------------Original Mail ------------------
Sender:Dan Hill <qu...@gmail.com>
Send Date:Mon Feb 8 07:40:36 2021
Recipients:user <us...@flink.apache.org>
Subject:UUID in part files

Hi.

Context
I'm migrating my Flink SQL job to DataStream.  When switching to StreamingFileSink, I noticed that the part files now do not have a uuid in them.  "part-0-0" vs "part-{uuid string}-0-0".  This is easy to add with OutputFileConfig.

Question
Is there a reason why the base OutputFileConfig doesn't add the uuid automatically?  Is this just a legacy issue?  Or do most people not have the uuid in the file outputs?