You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by dhurandar S <dh...@gmail.com> on 2020/05/12 21:13:04 UTC

changing the output files names in Streamfilesink from part-00 to something else

We want to change the name of the file being generated as the output of our
StreamFileSink.
, when files are generated they are named part-00*, is there a way that we
can change the name.

In Hadoop, we can change RecordWriters and MultipleOutputs. May I please
some help in this regard. This is causing blockers for us and will force us
t move to MR job

-- 
Thank you and regards,
Dhurandar

Re: changing the output files names in Streamfilesink from part-00 to something else

Posted by Danny Chan <yu...@gmail.com>.
The StreamingFileSink can have a OutputFileConfig [1] to config the prefix and suffix of the part file, does that work for you ?

[1] https://github.com/apache/flink/blob/1d9d0bf582a79ed5cba4ec096e9c12fe5618bcf7/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/StreamingFileSink.java#L71

Best,
Danny Chan
在 2020年5月14日 +0800 AM2:05,dhurandar S <dh...@gmail.com>,写道:
>
> StreamFileSink

Re: changing the output files names in Streamfilesink from part-00 to something else

Posted by Sivaprasanna <si...@gmail.com>.
Hi

Just shooting away my thoughts. Based on your what you had described so
far, I think your objective is to have some unique way to identify/filter
the output based on the organization. If that's the case, you can implement
a BucketAssigner with the logic to create a bucket key based on the
organization data.

Cheers,
Sivaprasanna

On Thu, May 14, 2020 at 12:13 PM Jingsong Li <ji...@gmail.com> wrote:

> Hi, Dhurandar,
>
> Can you describe your needs? Why do you need to modify file names
> flexibly? What kind of name do you want?
>
> Best,
> Jingsong Lee
>
> On Thu, May 14, 2020 at 2:05 AM dhurandar S <dh...@gmail.com>
> wrote:
>
>> Yes we looked at it ,
>> The problem is the file name gets generated in a dynamic fashion, based
>> on which organization data we are getting we generate the file name from
>> the coming data.
>>
>> Is there any way we can achieve this ??
>>
>> On Tue, May 12, 2020 at 8:38 PM Yun Gao <yu...@aliyun.com> wrote:
>>
>>> Hi Dhurandar:
>>>
>>>     Currently StreamingFileSink should be able to change the prefix and
>>> suffix of the filename[1], it could be changed to something like
>>> <prefix>-0-0<suffix>. Could this solve your problem ?
>>>
>>>
>>>  Best,
>>>   Yun
>>>
>>>
>>>
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration
>>>
>>>
>>>
>>> ------------------------------------------------------------------
>>> 发件人:dhurandar S<dh...@gmail.com>
>>> 日 期:2020年05月13日 05:13:04
>>> 收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
>>> 主 题:changing the output files names in Streamfilesink from part-00 to
>>> something else
>>>
>>> We want to change the name of the file being generated as the output of
>>> our StreamFileSink.
>>> , when files are generated they are named part-00*, is there a way that
>>> we can change the name.
>>>
>>> In Hadoop, we can change RecordWriters and MultipleOutputs. May I please
>>> some help in this regard. This is causing blockers for us and will force us
>>> t move to MR job
>>>
>>> --
>>> Thank you and regards,
>>> Dhurandar
>>>
>>>
>>>
>>
>> --
>> Thank you and regards,
>> Dhurandar
>>
>>
>
> --
> Best, Jingsong Lee
>

Re: changing the output files names in Streamfilesink from part-00 to something else

Posted by Jingsong Li <ji...@gmail.com>.
Hi Rahul,

Thanks for explaining. I see. Now there is no way to dynamic control file
name in StreamingFileSink.

If the number of organizations is not so huge. Like Sivaprasanna said, you
can use "BucketAssigner" to create bucket by your organization ID. The
bucket in StreamingFileSink is like Hive/Spark's partition, the information
is in directory name. Each organization creates a new directory.

Best,
Jingsong Lee

On Tue, May 19, 2020 at 2:03 AM dhurandar S <dh...@gmail.com> wrote:

> Hi Jingsong,
>
> We have a system where organizations keep getting added and removed on a
> regular basis, As the new organizations get added the data from these
> organization starts flowing into the streaming system, we do group by on
> Organisation ID which is part of the incoming event, If in the incoming
> stream we find any new Organisation Ids that we have not seen before then
> we create a new file and start writing data into it. But this is dynamic as
> in based on the incoming stream.
>
> regards,
> Rahul
>
> On Wed, May 13, 2020 at 11:43 PM Jingsong Li <ji...@gmail.com>
> wrote:
>
>> Hi, Dhurandar,
>>
>> Can you describe your needs? Why do you need to modify file names
>> flexibly? What kind of name do you want?
>>
>> Best,
>> Jingsong Lee
>>
>> On Thu, May 14, 2020 at 2:05 AM dhurandar S <dh...@gmail.com>
>> wrote:
>>
>>> Yes we looked at it ,
>>> The problem is the file name gets generated in a dynamic fashion, based
>>> on which organization data we are getting we generate the file name from
>>> the coming data.
>>>
>>> Is there any way we can achieve this ??
>>>
>>> On Tue, May 12, 2020 at 8:38 PM Yun Gao <yu...@aliyun.com> wrote:
>>>
>>>> Hi Dhurandar:
>>>>
>>>>     Currently StreamingFileSink should be able to change the prefix and
>>>> suffix of the filename[1], it could be changed to something like
>>>> <prefix>-0-0<suffix>. Could this solve your problem ?
>>>>
>>>>
>>>>  Best,
>>>>   Yun
>>>>
>>>>
>>>>
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------
>>>> 发件人:dhurandar S<dh...@gmail.com>
>>>> 日 期:2020年05月13日 05:13:04
>>>> 收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
>>>> 主 题:changing the output files names in Streamfilesink from part-00 to
>>>> something else
>>>>
>>>> We want to change the name of the file being generated as the output of
>>>> our StreamFileSink.
>>>> , when files are generated they are named part-00*, is there a way that
>>>> we can change the name.
>>>>
>>>> In Hadoop, we can change RecordWriters and MultipleOutputs. May I
>>>> please some help in this regard. This is causing blockers for us and will
>>>> force us t move to MR job
>>>>
>>>> --
>>>> Thank you and regards,
>>>> Dhurandar
>>>>
>>>>
>>>>
>>>
>>> --
>>> Thank you and regards,
>>> Dhurandar
>>>
>>>
>>
>> --
>> Best, Jingsong Lee
>>
>
>
> --
> Thank you and regards,
> Dhurandar
>
>

-- 
Best, Jingsong Lee

Re: changing the output files names in Streamfilesink from part-00 to something else

Posted by dhurandar S <dh...@gmail.com>.
Hi Jingsong,

We have a system where organizations keep getting added and removed on a
regular basis, As the new organizations get added the data from these
organization starts flowing into the streaming system, we do group by on
Organisation ID which is part of the incoming event, If in the incoming
stream we find any new Organisation Ids that we have not seen before then
we create a new file and start writing data into it. But this is dynamic as
in based on the incoming stream.

regards,
Rahul

On Wed, May 13, 2020 at 11:43 PM Jingsong Li <ji...@gmail.com> wrote:

> Hi, Dhurandar,
>
> Can you describe your needs? Why do you need to modify file names
> flexibly? What kind of name do you want?
>
> Best,
> Jingsong Lee
>
> On Thu, May 14, 2020 at 2:05 AM dhurandar S <dh...@gmail.com>
> wrote:
>
>> Yes we looked at it ,
>> The problem is the file name gets generated in a dynamic fashion, based
>> on which organization data we are getting we generate the file name from
>> the coming data.
>>
>> Is there any way we can achieve this ??
>>
>> On Tue, May 12, 2020 at 8:38 PM Yun Gao <yu...@aliyun.com> wrote:
>>
>>> Hi Dhurandar:
>>>
>>>     Currently StreamingFileSink should be able to change the prefix and
>>> suffix of the filename[1], it could be changed to something like
>>> <prefix>-0-0<suffix>. Could this solve your problem ?
>>>
>>>
>>>  Best,
>>>   Yun
>>>
>>>
>>>
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration
>>>
>>>
>>>
>>> ------------------------------------------------------------------
>>> 发件人:dhurandar S<dh...@gmail.com>
>>> 日 期:2020年05月13日 05:13:04
>>> 收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
>>> 主 题:changing the output files names in Streamfilesink from part-00 to
>>> something else
>>>
>>> We want to change the name of the file being generated as the output of
>>> our StreamFileSink.
>>> , when files are generated they are named part-00*, is there a way that
>>> we can change the name.
>>>
>>> In Hadoop, we can change RecordWriters and MultipleOutputs. May I please
>>> some help in this regard. This is causing blockers for us and will force us
>>> t move to MR job
>>>
>>> --
>>> Thank you and regards,
>>> Dhurandar
>>>
>>>
>>>
>>
>> --
>> Thank you and regards,
>> Dhurandar
>>
>>
>
> --
> Best, Jingsong Lee
>


-- 
Thank you and regards,
Dhurandar

Re: changing the output files names in Streamfilesink from part-00 to something else

Posted by Jingsong Li <ji...@gmail.com>.
Hi, Dhurandar,

Can you describe your needs? Why do you need to modify file names flexibly?
What kind of name do you want?

Best,
Jingsong Lee

On Thu, May 14, 2020 at 2:05 AM dhurandar S <dh...@gmail.com> wrote:

> Yes we looked at it ,
> The problem is the file name gets generated in a dynamic fashion, based on
> which organization data we are getting we generate the file name from the
> coming data.
>
> Is there any way we can achieve this ??
>
> On Tue, May 12, 2020 at 8:38 PM Yun Gao <yu...@aliyun.com> wrote:
>
>> Hi Dhurandar:
>>
>>     Currently StreamingFileSink should be able to change the prefix and
>> suffix of the filename[1], it could be changed to something like
>> <prefix>-0-0<suffix>. Could this solve your problem ?
>>
>>
>>  Best,
>>   Yun
>>
>>
>>
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration
>>
>>
>>
>> ------------------------------------------------------------------
>> 发件人:dhurandar S<dh...@gmail.com>
>> 日 期:2020年05月13日 05:13:04
>> 收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
>> 主 题:changing the output files names in Streamfilesink from part-00 to
>> something else
>>
>> We want to change the name of the file being generated as the output of
>> our StreamFileSink.
>> , when files are generated they are named part-00*, is there a way that
>> we can change the name.
>>
>> In Hadoop, we can change RecordWriters and MultipleOutputs. May I please
>> some help in this regard. This is causing blockers for us and will force us
>> t move to MR job
>>
>> --
>> Thank you and regards,
>> Dhurandar
>>
>>
>>
>
> --
> Thank you and regards,
> Dhurandar
>
>

-- 
Best, Jingsong Lee

Re: changing the output files names in Streamfilesink from part-00 to something else

Posted by dhurandar S <dh...@gmail.com>.
Yes we looked at it ,
The problem is the file name gets generated in a dynamic fashion, based on
which organization data we are getting we generate the file name from the
coming data.

Is there any way we can achieve this ??

On Tue, May 12, 2020 at 8:38 PM Yun Gao <yu...@aliyun.com> wrote:

> Hi Dhurandar:
>
>     Currently StreamingFileSink should be able to change the prefix and
> suffix of the filename[1], it could be changed to something like
> <prefix>-0-0<suffix>. Could this solve your problem ?
>
>
>  Best,
>   Yun
>
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration
>
>
>
> ------------------------------------------------------------------
> 发件人:dhurandar S<dh...@gmail.com>
> 日 期:2020年05月13日 05:13:04
> 收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
> 主 题:changing the output files names in Streamfilesink from part-00 to
> something else
>
> We want to change the name of the file being generated as the output of
> our StreamFileSink.
> , when files are generated they are named part-00*, is there a way that we
> can change the name.
>
> In Hadoop, we can change RecordWriters and MultipleOutputs. May I please
> some help in this regard. This is causing blockers for us and will force us
> t move to MR job
>
> --
> Thank you and regards,
> Dhurandar
>
>
>

-- 
Thank you and regards,
Dhurandar

Re: changing the output files names in Streamfilesink from part-00 to something else

Posted by dhurandar S <dh...@gmail.com>.
Yes we looked at it ,
The problem is the file name gets generated in a dynamic fashion, based on
which organization data we are getting we generate the file name from the
coming data.

Is there any way we can achieve this ??

On Tue, May 12, 2020 at 8:38 PM Yun Gao <yu...@aliyun.com> wrote:

> Hi Dhurandar:
>
>     Currently StreamingFileSink should be able to change the prefix and
> suffix of the filename[1], it could be changed to something like
> <prefix>-0-0<suffix>. Could this solve your problem ?
>
>
>  Best,
>   Yun
>
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration
>
>
>
> ------------------------------------------------------------------
> 发件人:dhurandar S<dh...@gmail.com>
> 日 期:2020年05月13日 05:13:04
> 收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
> 主 题:changing the output files names in Streamfilesink from part-00 to
> something else
>
> We want to change the name of the file being generated as the output of
> our StreamFileSink.
> , when files are generated they are named part-00*, is there a way that we
> can change the name.
>
> In Hadoop, we can change RecordWriters and MultipleOutputs. May I please
> some help in this regard. This is causing blockers for us and will force us
> t move to MR job
>
> --
> Thank you and regards,
> Dhurandar
>
>
>

-- 
Thank you and regards,
Dhurandar

回复:changing the output files names in Streamfilesink from part-00 to something else

Posted by Yun Gao <yu...@aliyun.com.INVALID>.
Hi Dhurandar:

    Currently StreamingFileSink should be able to change the prefix and suffix of the filename[1], it could be changed to something like <prefix>-0-0<suffix>. Could this solve your problem ?


 Best,
  Yun




[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration



------------------------------------------------------------------
发件人:dhurandar S<dh...@gmail.com>
日 期:2020年05月13日 05:13:04
收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
主 题:changing the output files names in Streamfilesink from part-00 to something else

We want to change the name of the file being generated as the output of our StreamFileSink. 
, when files are generated they are named part-00*, is there a way that we can change the name. 

In Hadoop, we can change RecordWriters and MultipleOutputs. May I please some help in this regard. This is causing blockers for us and will force us t move to MR job 

-- 
Thank you and regards,
Dhurandar



回复:changing the output files names in Streamfilesink from part-00 to something else

Posted by Yun Gao <yu...@aliyun.com>.
Hi Dhurandar:

    Currently StreamingFileSink should be able to change the prefix and suffix of the filename[1], it could be changed to something like <prefix>-0-0<suffix>. Could this solve your problem ?


 Best,
  Yun




[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html#part-file-configuration



------------------------------------------------------------------
发件人:dhurandar S<dh...@gmail.com>
日 期:2020年05月13日 05:13:04
收件人:user<us...@flink.apache.org>; <de...@flink.apache.org>
主 题:changing the output files names in Streamfilesink from part-00 to something else

We want to change the name of the file being generated as the output of our StreamFileSink. 
, when files are generated they are named part-00*, is there a way that we can change the name. 

In Hadoop, we can change RecordWriters and MultipleOutputs. May I please some help in this regard. This is causing blockers for us and will force us t move to MR job 

-- 
Thank you and regards,
Dhurandar