You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jeff Klukas <jk...@mozilla.com> on 2018/10/29 18:08:37 UTC

FileSystems should retrieve lastModified time

I just wrote up a JIRA issues proposing that FileSystem implementations
retrieve lastModified time of the files they list:
https://issues.apache.org/jira/browse/BEAM-5910

Any immediate concerns? I'm not intimately familiar with HDFS, but I'm
otherwise confident that GCS, S3, and local filesystems can all give us a
suitable timestamp.

In the short term, this change would allow users to write their own polling
logic on top of FileSystems to periodically check for updates to files.
Currently, you would need to fall back to the APIs for each individual
storage provider.

Longer term, I'd love to see FileIO.match.continuously support an option
for returning updated contents when files are updated.

Re: FileSystems should retrieve lastModified time

Posted by Chamikara Jayalath <ch...@google.com>.
+1 for adding last modified time to MatchResult.Metadata.

Sounds like a useful change that will enable additional use-cases.

- Cham

On Mon, Oct 29, 2018 at 11:08 AM Jeff Klukas <jk...@mozilla.com> wrote:

> I just wrote up a JIRA issues proposing that FileSystem implementations
> retrieve lastModified time of the files they list:
> https://issues.apache.org/jira/browse/BEAM-5910
>
> Any immediate concerns? I'm not intimately familiar with HDFS, but I'm
> otherwise confident that GCS, S3, and local filesystems can all give us a
> suitable timestamp.
>
> In the short term, this change would allow users to write their own
> polling logic on top of FileSystems to periodically check for updates to
> files. Currently, you would need to fall back to the APIs for each
> individual storage provider.
>
> Longer term, I'd love to see FileIO.match.continuously support an option
> for returning updated contents when files are updated.
>