You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2007/03/07 20:57:24 UTC

[jira] Created: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

updating a hdfs file, doesn't cause the distributed file cache to update itself
-------------------------------------------------------------------------------

                 Key: HADOOP-1084
                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.0
            Reporter: Owen O'Malley
         Assigned To: Mahadev konar
             Fix For: 0.13.0


If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1084:
----------------------------------

    Attachment: HADOOP-1084_1_20070716.patch

First take on using filestamps in lieu of md5 of crcs for the filecache, I'll continue testing this...

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1084_1_20070716.patch
>
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1084:
----------------------------------

    Fix Version/s:     (was: 0.13.0)

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>         Assigned To: Mahadev konar
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1084:
----------------------------------

    Fix Version/s: 0.14.0
         Priority: Blocker  (was: Major)

With HADOOP-1134 slated for 0.14, we need to fix this since crcs are going away... hence I'm marking this as a BLOCKER.

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508965 ] 

Owen O'Malley commented on HADOOP-1084:
---------------------------------------

With the block-level crc's coming, we should move to filestamps rather than checksums. It is less precise, but the crc information is going away. 

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Mahadev konar
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1084:
----------------------------------

    Status: Patch Available  (was: Open)

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1084_1_20070716.patch, HADOOP-1084_2_20070719.patch
>
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1084:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Arun!

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1084_1_20070716.patch, HADOOP-1084_2_20070719.patch
>
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513926 ] 

Hadoop QA commented on HADOOP-1084:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12362132/HADOOP-1084_2_20070719.patch applied and successfully tested against trunk revision r557118.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/433/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/433/console

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1084_1_20070716.patch, HADOOP-1084_2_20070719.patch
>
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned HADOOP-1084:
-------------------------------------

    Assignee: Arun C Murthy  (was: Mahadev konar)

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1084) updating a hdfs file, doesn't cause the distributed file cache to update itself

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1084:
----------------------------------

    Attachment: HADOOP-1084_2_20070719.patch

Patch updated due to changes in trunk and added checks to ensure that changes to the files to be cached on hdfs _after_ job has started are caught, and punished! *smile*

> updating a hdfs file, doesn't cause the distributed file cache to update itself
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-1084
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1084
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HADOOP-1084_1_20070716.patch, HADOOP-1084_2_20070719.patch
>
>
> If I delete and upload a new version of a file /user/owen/foo to HDFS and start my job with hdfs://user/owen/foo as a cached file, it will use the previous contents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.