You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Carlos Alonso <ca...@mrcalonso.com> on 2018/02/13 14:33:10 UTC

How does TextIO decides when to finalise a file?

Hi everyone!!

I'm wondering how a TextIO with dynamic routing knows/decides when to
finalise a file and what happens if after it is finalised, another element
routed for the same file appears.

Thanks!

Re: How does TextIO decides when to finalise a file?

Posted by Eugene Kirpichov <ki...@google.com>.
It is quite complicated. See
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
in
particular the expand() method. At a high level, it assigns a shard index
to every element and then groups by destination and shard index (implicitly
also by window), and writes each group to its own temporary file (so
there's 1 set of temporary files generated for each trigger firing), then
renames temporary files.

On Tue, Feb 13, 2018 at 12:30 PM Carlos Alonso <ca...@mrcalonso.com> wrote:

> Cool thanks!
>
> How does it work internally? Are all the elements routed to the same path
> grouped and processed within the same bundle?
>
> Thanks!
>
> On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> It will do its best to throw an exception if duplicate names are produced
>> within one pane. Otherwise, it will overwrite.
>>
>> On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso <ca...@mrcalonso.com>
>> wrote:
>>
>>> Cool, thanks.
>>>
>>> What if the destination is not properly coded and the File naming policy
>>> then produces a duplicated path? Will it throw an exception? Overwrite?
>>>
>>> Thanks!
>>>
>>> On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov <ki...@google.com>
>>> wrote:
>>>
>>>> Dynamic file writes generate 1 set of files (shards) for every pane
>>>> firing of every window of every destination. File naming policy is required
>>>> to produce different names for every combination of (destination, shard
>>>> index, window, pane) so you never have to append or overwrite. A new
>>>> element arriving for a destination after something for that destination has
>>>> already been written will simply be in the next pane, or in a different
>>>> window.
>>>>
>>>> On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso <ca...@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> Hi everyone!!
>>>>>
>>>>> I'm wondering how a TextIO with dynamic routing knows/decides when to
>>>>> finalise a file and what happens if after it is finalised, another element
>>>>> routed for the same file appears.
>>>>>
>>>>> Thanks!
>>>>>
>>>>

Re: How does TextIO decides when to finalise a file?

Posted by Carlos Alonso <ca...@mrcalonso.com>.
Cool thanks!

How does it work internally? Are all the elements routed to the same path
grouped and processed within the same bundle?

Thanks!

On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov <ki...@google.com>
wrote:

> It will do its best to throw an exception if duplicate names are produced
> within one pane. Otherwise, it will overwrite.
>
> On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso <ca...@mrcalonso.com>
> wrote:
>
>> Cool, thanks.
>>
>> What if the destination is not properly coded and the File naming policy
>> then produces a duplicated path? Will it throw an exception? Overwrite?
>>
>> Thanks!
>>
>> On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Dynamic file writes generate 1 set of files (shards) for every pane
>>> firing of every window of every destination. File naming policy is required
>>> to produce different names for every combination of (destination, shard
>>> index, window, pane) so you never have to append or overwrite. A new
>>> element arriving for a destination after something for that destination has
>>> already been written will simply be in the next pane, or in a different
>>> window.
>>>
>>> On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso <ca...@mrcalonso.com>
>>> wrote:
>>>
>>>> Hi everyone!!
>>>>
>>>> I'm wondering how a TextIO with dynamic routing knows/decides when to
>>>> finalise a file and what happens if after it is finalised, another element
>>>> routed for the same file appears.
>>>>
>>>> Thanks!
>>>>
>>>

Re: How does TextIO decides when to finalise a file?

Posted by Eugene Kirpichov <ki...@google.com>.
It will do its best to throw an exception if duplicate names are produced
within one pane. Otherwise, it will overwrite.

On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso <ca...@mrcalonso.com> wrote:

> Cool, thanks.
>
> What if the destination is not properly coded and the File naming policy
> then produces a duplicated path? Will it throw an exception? Overwrite?
>
> Thanks!
>
> On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Dynamic file writes generate 1 set of files (shards) for every pane
>> firing of every window of every destination. File naming policy is required
>> to produce different names for every combination of (destination, shard
>> index, window, pane) so you never have to append or overwrite. A new
>> element arriving for a destination after something for that destination has
>> already been written will simply be in the next pane, or in a different
>> window.
>>
>> On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso <ca...@mrcalonso.com> wrote:
>>
>>> Hi everyone!!
>>>
>>> I'm wondering how a TextIO with dynamic routing knows/decides when to
>>> finalise a file and what happens if after it is finalised, another element
>>> routed for the same file appears.
>>>
>>> Thanks!
>>>
>>

Re: How does TextIO decides when to finalise a file?

Posted by Carlos Alonso <ca...@mrcalonso.com>.
Cool, thanks.

What if the destination is not properly coded and the File naming policy
then produces a duplicated path? Will it throw an exception? Overwrite?

Thanks!

On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov <ki...@google.com>
wrote:

> Dynamic file writes generate 1 set of files (shards) for every pane firing
> of every window of every destination. File naming policy is required to
> produce different names for every combination of (destination, shard index,
> window, pane) so you never have to append or overwrite. A new element
> arriving for a destination after something for that destination has already
> been written will simply be in the next pane, or in a different window.
>
> On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso <ca...@mrcalonso.com> wrote:
>
>> Hi everyone!!
>>
>> I'm wondering how a TextIO with dynamic routing knows/decides when to
>> finalise a file and what happens if after it is finalised, another element
>> routed for the same file appears.
>>
>> Thanks!
>>
>

Re: How does TextIO decides when to finalise a file?

Posted by Eugene Kirpichov <ki...@google.com>.
Dynamic file writes generate 1 set of files (shards) for every pane firing
of every window of every destination. File naming policy is required to
produce different names for every combination of (destination, shard index,
window, pane) so you never have to append or overwrite. A new element
arriving for a destination after something for that destination has already
been written will simply be in the next pane, or in a different window.

On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso <ca...@mrcalonso.com> wrote:

> Hi everyone!!
>
> I'm wondering how a TextIO with dynamic routing knows/decides when to
> finalise a file and what happens if after it is finalised, another element
> routed for the same file appears.
>
> Thanks!
>