You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Andrew Olson (JIRA)" <ji...@apache.org> on 2019/02/19 22:58:01 UTC
[jira] [Created] (CRUNCH-678) Avoid unnecessary retrieval of last
modified time
Andrew Olson created CRUNCH-678:
-----------------------------------
Summary: Avoid unnecessary retrieval of last modified time
Key: CRUNCH-678
URL: https://issues.apache.org/jira/browse/CRUNCH-678
Project: Crunch
Issue Type: Improvement
Components: Core
Reporter: Andrew Olson
Assignee: Josh Wills
There is no assurance that the last modified time can be retrieved efficiently for all file systems. In particular, with object stores and large data sets it could be very slow. Since this information is actually not always needed, we should only retrieve it when necessary (i.e. when the write mode is checkpoint) for sources and targets.
CRUNCH-658 expressed similar concerns for the getSize method. This would be a simpler and safer optimization to make.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)