You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/04/18 20:57:20 UTC

[GitHub] [beam] Abacn commented on a diff in pull request #17380: [BEAM-14314] Add last_updated field in filesystem.FileMetaData

Abacn commented on code in PR #17380:
URL: https://github.com/apache/beam/pull/17380#discussion_r852409977


##########
sdks/python/apache_beam/io/hadoopfilesystem.py:
##########
@@ -399,6 +407,26 @@ def checksum(self, url):
         file_checksum[_FILE_CHECKSUM_BYTES],
     )
 
+  def metadata(self, url):
+    """Fetch metadata fields of a file on the FileSystem.
+
+    Args:
+      url: string url of a file.
+
+    Returns:
+      :class:`~apache_beam.io.filesystem.FileMetadata`.
+      Note: last_updated field is not supported yet.
+
+    Raises:
+      ``BeamIOError``: if url doesn't exist.
+    """
+    _, path = self._parse_url(url)
+    status = self._hdfs_client.status(path, strict=False)
+    print(status)
+    if status is None:
+      raise BeamIOError('File not found: %s' % url)
+    return FileMetadata(url, status[_FILE_STATUS_LENGTH])

Review Comment:
   I believe so, according to http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#FileStatus, the returned json should include a 'modificationTime' field. Implementing this also involves updating FakeFile class in unit test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org