You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2019/02/20 00:27:00 UTC

[jira] [Resolved] (CRUNCH-678) Avoid unnecessary retrieval of last modified time

     [ https://issues.apache.org/jira/browse/CRUNCH-678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills resolved CRUNCH-678.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.0.0

Thank you [~noslowerdna]!

> Avoid unnecessary retrieval of last modified time
> -------------------------------------------------
>
>                 Key: CRUNCH-678
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-678
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Andrew Olson
>            Assignee: Josh Wills
>            Priority: Major
>             Fix For: 1.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is no assurance that the last modified time can be retrieved efficiently for all file systems. In particular, with object stores and large data sets it could be very slow. Since this information is actually not always needed, we should only retrieve it when necessary (i.e. when the write mode is checkpoint) for sources and targets.
> CRUNCH-658 expressed similar concerns for the getSize method. This would be a simpler and safer optimization to make.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)