You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dan Hill <qu...@gmail.com> on 2021/02/07 23:40:07 UTC
UUID in part files
Hi.
*Context*
I'm migrating my Flink SQL job to DataStream. When switching to
StreamingFileSink, I noticed that the part files now do not have a uuid in
them. "part-0-0" vs "part-{uuid string}-0-0". This is easy to add with
OutputFileConfig.
*Question*
Is there a reason why the base OutputFileConfig doesn't add the uuid
automatically? Is this just a legacy issue? Or do most people not have
the uuid in the file outputs?
Re: UUID in part files
Posted by Yun Gao <yu...@aliyun.com>.
Hi Dan
The SQL add the uuid by default is for the case that users want execute
multiple bounded sql and append to the same directory (hive table), thus
a uuid is attached to avoid overriding the previous output.
The datastream could be viewed as providing the low-level api and
thus it does not add the uuid automatically. And as you have pointed out,
by using OutputFileConfig users could also implement the functionality.
Best,
Yun
------------------Original Mail ------------------
Sender:Dan Hill <qu...@gmail.com>
Send Date:Mon Feb 8 07:40:36 2021
Recipients:user <us...@flink.apache.org>
Subject:UUID in part files
Hi.
Context
I'm migrating my Flink SQL job to DataStream. When switching to StreamingFileSink, I noticed that the part files now do not have a uuid in them. "part-0-0" vs "part-{uuid string}-0-0". This is easy to add with OutputFileConfig.
Question
Is there a reason why the base OutputFileConfig doesn't add the uuid automatically? Is this just a legacy issue? Or do most people not have the uuid in the file outputs?