You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Andrew Olson (JIRA)" <ji...@apache.org> on 2019/02/19 22:58:01 UTC

[jira] [Created] (CRUNCH-678) Avoid unnecessary retrieval of last modified time

Andrew Olson created CRUNCH-678:
-----------------------------------

             Summary: Avoid unnecessary retrieval of last modified time
                 Key: CRUNCH-678
                 URL: https://issues.apache.org/jira/browse/CRUNCH-678
             Project: Crunch
          Issue Type: Improvement
          Components: Core
            Reporter: Andrew Olson
            Assignee: Josh Wills


There is no assurance that the last modified time can be retrieved efficiently for all file systems. In particular, with object stores and large data sets it could be very slow. Since this information is actually not always needed, we should only retrieve it when necessary (i.e. when the write mode is checkpoint) for sources and targets.

CRUNCH-658 expressed similar concerns for the getSize method. This would be a simpler and safer optimization to make.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)