You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Matthew Schneid <ma...@gmail.com> on 2018/11/27 00:56:01 UTC

TextIO setting file dynamically issue

Hello,

I have an interesting issue that I can’t seem to find a reliable resolution too.

I have a standard TextIO output that looks like the following:

TextIO.write().to("gs://<FOLDERPATH>+ new DateTime().toString("HH-mm-ss") + "/Test-")

The above works, and writes to GSC, as I expect it too.

However, it retains the instantiated datetime value, and what I need to happen is for it to dynamically change with the current time.

Is this possible?

Thanks for any and all help that can be provided.

V/R

MS

Re: TextIO setting file dynamically issue

Posted by Jeff Klukas <jk...@mozilla.com>.
You can likely achieve what you want using FileIO with dynamic
destinations, which is described in the "Advanced features" section of the
TextIO docs [0].

Your case might look something like:

 PCollection<Event> events = ...;
 events.apply(FileIO.<EventType, Event>writeDynamic()
       .by(event -> formatAsHHMMSS(event.timestamp))
       .via(TextIO.sink(), Event::toString)
       .to(timeString ->
nameFilesUsingWindowPaneAndShard("gs://<FOLDERPATH>/" + timeString +
"/Test", ".txt")));

This assumes the time you care about is part of the data structure you're
trying to write out. Per Reuven's point, if you wanted to use processing
time instead, your by() function could look more like your initial example:

       .by(event -> formatAsHHMMSS(new DateTime()))


[0]
https://beam.apache.org/releases/javadoc/2.8.0/index.html?org/apache/beam/sdk/io/TextIO.html#advanced-features

On Tue, Nov 27, 2018 at 6:48 PM Lukasz Cwik <lc...@google.com> wrote:

> +user@beam.apache.org <us...@beam.apache.org>
>
> On Mon, Nov 26, 2018 at 5:33 PM Reuven Lax <re...@google.com> wrote:
>
>> Do you need it to change based on the timestamps of the records being
>> processed, or based on actual current time?
>>
>> On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid <
>> matthew.t.schneid@gmail.com> wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> I have an interesting issue that I can’t seem to find a reliable
>>> resolution too.
>>>
>>>
>>>
>>> I have a standard TextIO output that looks like the following:
>>>
>>>
>>>
>>> TextIO.*write*().to("gs://<FOLDERPATH>+ new DateTime().toString("HH-mm-ss") + "/Test-")
>>>
>>>
>>>
>>> The above works, and writes to GSC, as I expect it too.
>>>
>>>
>>>
>>> However, it retains the instantiated datetime value, and what I need to
>>> happen is for it to dynamically change with the current time.
>>>
>>>
>>>
>>> Is this possible?
>>>
>>>
>>>
>>> Thanks for any and all help that can be provided.
>>>
>>>
>>>
>>> V/R
>>>
>>>
>>>
>>> MS
>>>
>>

Re: TextIO setting file dynamically issue

Posted by Jeff Klukas <jk...@mozilla.com>.
You can likely achieve what you want using FileIO with dynamic
destinations, which is described in the "Advanced features" section of the
TextIO docs [0].

Your case might look something like:

 PCollection<Event> events = ...;
 events.apply(FileIO.<EventType, Event>writeDynamic()
       .by(event -> formatAsHHMMSS(event.timestamp))
       .via(TextIO.sink(), Event::toString)
       .to(timeString ->
nameFilesUsingWindowPaneAndShard("gs://<FOLDERPATH>/" + timeString +
"/Test", ".txt")));

This assumes the time you care about is part of the data structure you're
trying to write out. Per Reuven's point, if you wanted to use processing
time instead, your by() function could look more like your initial example:

       .by(event -> formatAsHHMMSS(new DateTime()))


[0]
https://beam.apache.org/releases/javadoc/2.8.0/index.html?org/apache/beam/sdk/io/TextIO.html#advanced-features

On Tue, Nov 27, 2018 at 6:48 PM Lukasz Cwik <lc...@google.com> wrote:

> +user@beam.apache.org <us...@beam.apache.org>
>
> On Mon, Nov 26, 2018 at 5:33 PM Reuven Lax <re...@google.com> wrote:
>
>> Do you need it to change based on the timestamps of the records being
>> processed, or based on actual current time?
>>
>> On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid <
>> matthew.t.schneid@gmail.com> wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> I have an interesting issue that I can’t seem to find a reliable
>>> resolution too.
>>>
>>>
>>>
>>> I have a standard TextIO output that looks like the following:
>>>
>>>
>>>
>>> TextIO.*write*().to("gs://<FOLDERPATH>+ new DateTime().toString("HH-mm-ss") + "/Test-")
>>>
>>>
>>>
>>> The above works, and writes to GSC, as I expect it too.
>>>
>>>
>>>
>>> However, it retains the instantiated datetime value, and what I need to
>>> happen is for it to dynamically change with the current time.
>>>
>>>
>>>
>>> Is this possible?
>>>
>>>
>>>
>>> Thanks for any and all help that can be provided.
>>>
>>>
>>>
>>> V/R
>>>
>>>
>>>
>>> MS
>>>
>>

Re: TextIO setting file dynamically issue

Posted by Lukasz Cwik <lc...@google.com>.
+user@beam.apache.org <us...@beam.apache.org>

On Mon, Nov 26, 2018 at 5:33 PM Reuven Lax <re...@google.com> wrote:

> Do you need it to change based on the timestamps of the records being
> processed, or based on actual current time?
>
> On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid <
> matthew.t.schneid@gmail.com> wrote:
>
>> Hello,
>>
>>
>>
>> I have an interesting issue that I can’t seem to find a reliable
>> resolution too.
>>
>>
>>
>> I have a standard TextIO output that looks like the following:
>>
>>
>>
>> TextIO.*write*().to("gs://<FOLDERPATH>+ new DateTime().toString("HH-mm-ss") + "/Test-")
>>
>>
>>
>> The above works, and writes to GSC, as I expect it too.
>>
>>
>>
>> However, it retains the instantiated datetime value, and what I need to
>> happen is for it to dynamically change with the current time.
>>
>>
>>
>> Is this possible?
>>
>>
>>
>> Thanks for any and all help that can be provided.
>>
>>
>>
>> V/R
>>
>>
>>
>> MS
>>
>

Re: TextIO setting file dynamically issue

Posted by Lukasz Cwik <lc...@google.com>.
+user@beam.apache.org <us...@beam.apache.org>

On Mon, Nov 26, 2018 at 5:33 PM Reuven Lax <re...@google.com> wrote:

> Do you need it to change based on the timestamps of the records being
> processed, or based on actual current time?
>
> On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid <
> matthew.t.schneid@gmail.com> wrote:
>
>> Hello,
>>
>>
>>
>> I have an interesting issue that I can’t seem to find a reliable
>> resolution too.
>>
>>
>>
>> I have a standard TextIO output that looks like the following:
>>
>>
>>
>> TextIO.*write*().to("gs://<FOLDERPATH>+ new DateTime().toString("HH-mm-ss") + "/Test-")
>>
>>
>>
>> The above works, and writes to GSC, as I expect it too.
>>
>>
>>
>> However, it retains the instantiated datetime value, and what I need to
>> happen is for it to dynamically change with the current time.
>>
>>
>>
>> Is this possible?
>>
>>
>>
>> Thanks for any and all help that can be provided.
>>
>>
>>
>> V/R
>>
>>
>>
>> MS
>>
>

Re: TextIO setting file dynamically issue

Posted by Reuven Lax <re...@google.com>.
Do you need it to change based on the timestamps of the records being
processed, or based on actual current time?

On Mon, Nov 26, 2018 at 5:30 PM Matthew Schneid <ma...@gmail.com>
wrote:

> Hello,
>
>
>
> I have an interesting issue that I can’t seem to find a reliable
> resolution too.
>
>
>
> I have a standard TextIO output that looks like the following:
>
>
>
> TextIO.*write*().to("gs://<FOLDERPATH>+ new DateTime().toString("HH-mm-ss") + "/Test-")
>
>
>
> The above works, and writes to GSC, as I expect it too.
>
>
>
> However, it retains the instantiated datetime value, and what I need to
> happen is for it to dynamically change with the current time.
>
>
>
> Is this possible?
>
>
>
> Thanks for any and all help that can be provided.
>
>
>
> V/R
>
>
>
> MS
>