You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Yoram Arnon (JIRA)" <ji...@apache.org> on 2006/03/17 03:17:13 UTC

[jira] Created: (HADOOP-89) files are not visible until they are closed

files are not visible until they are closed
-------------------------------------------

         Key: HADOOP-89
         URL: http://issues.apache.org/jira/browse/HADOOP-89
     Project: Hadoop
        Type: Bug
  Components: dfs  
    Versions: 0.1    
    Reporter: Yoram Arnon
    Priority: Critical


the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
1. no practical way to know if a file/job is progressing
2. no way to implement files that never close, such as log files
3. failure to close a file results in loss of the file

The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment: tail2.patch

merged patch with latest trunk.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail2.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment:     (was: tail2.patch)

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail3.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518884 ] 

dhruba borthakur commented on HADOOP-89:
----------------------------------------

This patch does not change APIs and disk formats. The idea is that files that are created by clients appear immediately in the namespace. Other clients can see the file and get its attributes. This is a big change in semantics and I would rather do this in the beginning of a release than towards the end of the release cycle. This is a step in implementing full-fledged "appends" for HDFS files (HADOOP-1700).

Regarding "removing pending creates" structures, I agree that they should move into a new kind of inode. I can do it after you introduce the concept of class-hierarchy of inodes (HADOOP-1687).

Please let me know if the above sounds good to you and makes you agree that this patch is commit-able.




> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur reassigned HADOOP-89:
--------------------------------------

    Assignee: dhruba borthakur  (was: Sameer Paranjpye)

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment: tail4.patch

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Fix Version/s: 0.15.0
           Status: Patch Available  (was: Open)

Marking it Patch Available to trigger automatic tests,

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "Wendy Chien (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-89?page=all ]

Wendy Chien updated HADOOP-89:
------------------------------

    Status: Open  (was: Patch Available)

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: http://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Priority: Critical
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "Wendy Chien (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-89?page=all ]

Wendy Chien updated HADOOP-89:
------------------------------

    Status: Patch Available  (was: Open)

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: http://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Priority: Critical
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539814 ] 

dhruba borthakur commented on HADOOP-89:
----------------------------------------

That was what was decided in the last group meeting between you, me, Owen,
Sameer, etc. That's why I introduced the changes to FsShell.java.






> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment: tail.patch

1. merged patch with latest trunk.
2. Re-introduced the "tail" command from dfs shell.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail.patch, tail3.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528626 ] 

Konstantin Shvachko commented on HADOOP-89:
-------------------------------------------

> 2. Re-introduced the "tail" command from dfs shell.

What is the point of reviews and discussions then?

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528221 ] 

Konstantin Shvachko commented on HADOOP-89:
-------------------------------------------

+1
My only concern is that in order to add or remove blocks you reallocate the array of blocks
so that it's length always equals the number of blocks in the file.
I tested this patch with small files (1 or 2 blocks) and did not see any degradation in performance.
Since our average file size is 1.5 blocks this is good. 
IMO we should still test it for bigger files to make sure there is no slow-down.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail3.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531423 ] 

dhruba borthakur commented on HADOOP-89:
----------------------------------------

It is actually HADOOP-1708 that introduced the feature that "files appear in namespace as soon as they are created". I changed the release notes (CHANGES.txt) to ensure that this change appears in the INCOMPATIBLE section rather than in NEW FEATURES section.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530283 ] 

Milind Bhandarkar commented on HADOOP-89:
-----------------------------------------

This feature alters user-facing behavior. Until now, a file's existence was an indication that the task producing the file is complete. Please mark it such in release notes. Thanks.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526659 ] 

Konstantin Shvachko commented on HADOOP-89:
-------------------------------------------

This patch does not apply anymore.

With HADOOP-1700 on the horizon does it make sense to introduce tail -f here?
Tail -f works until the client that is writing to the file is alive. If it dies before closing the file all information
reported by tail -f does not exist in the system anymore. So in a way tail -f reports an illusive data that is not 
guaranteed to be present in the future.
This patch has a lot of important internal changes, like removing pendingCreates etc.
But introducing tail -f at this point may cause confusion among potential users.


> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail2.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528925 ] 

Konstantin Shvachko commented on HADOOP-89:
-------------------------------------------

Dhruba convinced me that it is worth committing the patch with the tail functionality in it.
There are 2 types of failures that lead to a loss of entire file data. 
# The name-node  failure, and 
# the client failure.

If the name-node dies the length of an incomplete file will be set to 0, which correspond to the current behavior, when we just loose the entire file.
If the client dies the name-node automatically closes all files created by the client as long as it detects the client lease expiration.
The last one is the most common case of failure, and the code provides protection for from loosing data in the case.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment:     (was: atomicCreation.patch)

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529081 ] 

Hudson commented on HADOOP-89:
------------------------------

Integrated in Hadoop-Nightly #243 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/243/])

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523170 ] 

dhruba borthakur commented on HADOOP-89:
----------------------------------------

1. Contents of new blocks that are appended to a file is visible to clients as soon as the datanode reports the block to the namenode. This means that data is visible to clients even when the block metadata is not yet persisted on disk. This approach lets us avoid a fs-transaction into the edit log for every new block allocation.

2. The block allocation for a file is persisted in the edit log when the file is closed.

3. A new API FSDataOutputStream.sync() allows an application to make data persistent on disk even before the file is closed. The invocation of this API causes a transaction to be logged into the edits log to record the blocks that are currently allocated to the file. An application that is recording data to a log file will periodically invoke this API to ensure that the contents of the log file persist even if the application dies before closing the file.

4. The FsShell utility has a new command that is invoked as "bin/hadoop dfs -tail [-f] <filename>". When the "-f" option is used, the FsShell utility will periodically poll for changes to the filesize. When a filesize change is detected, it will re-open the file and will display the new contents that were added to the file.


> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528644 ] 

Hadoop QA commented on HADOOP-89:
---------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12366109/tail4.patch
against trunk revision r577010.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/785/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/785/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/785/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/785/console

This message is automatically generated.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment:     (was: atomicCreation.patch)

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518853 ] 

Konstantin Shvachko commented on HADOOP-89:
-------------------------------------------

I see 2 problems with this patch:
# It does not close any of the 3 issues raised in the description above.
You can see files in the ls, but you cannot see file progress and you still loose all accumulated data if the file is not closed.
# Implementation-wise I would expect that when files are made visible we will get rid of the pending create structure, 
but it is still there, so now you will have to synchronize pending create data with the corresponding inode.

I'd propose to make files both visible and readable while they are still being created. Namely, we should let
clients read portions of the file that have been written and replicated on data-nodes.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment: atomicCreation.patch

This patch makes a file visible in the file system as soon as it is created by an application. However, the data blocks are associated with a file when the file gets closed.

If a DFS client A has created a file and is writing data to it, another DFS client will see the file size as zero until A closes the file.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: Sameer Paranjpye
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment: atomicCreation.patch

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by Nigel Daley <nd...@yahoo-inc.com>.
This is an incompatible change and as such should be in the  
INCOMPATIBLE CHANGES section of CHANGES.TXT

On Sep 25, 2007, at 2:51 PM, dhruba borthakur (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/HADOOP-89? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12530304 ]
>
> dhruba borthakur commented on HADOOP-89:
> ----------------------------------------
>
> The CHANGES.txt file lists this issue under the section titled "NEW  
> FEATURES". So, this should figure prominently in the release notes.
>
>> files are not visible until they are closed
>> -------------------------------------------
>>
>>                 Key: HADOOP-89
>>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: dfs
>>    Affects Versions: 0.1.0
>>            Reporter: Yoram Arnon
>>            Assignee: dhruba borthakur
>>            Priority: Critical
>>             Fix For: 0.15.0
>>
>>         Attachments: tail.patch, tail3.patch, tail4.patch
>>
>>
>> the current behaviour, whereby a file is not visible until it is  
>> closed has several flaws,including:
>> 1. no practical way to know if a file/job is progressing
>> 2. no way to implement files that never close, such as log files
>> 3. failure to close a file results in loss of the file
>> The part of the file that's written should be visible.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530304 ] 

dhruba borthakur commented on HADOOP-89:
----------------------------------------

The CHANGES.txt file lists this issue under the section titled "NEW FEATURES". So, this should figure prominently in the release notes.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment: tail.patch

This patch implements 1, 2 and 4 of the above.

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment:     (was: tail.patch)

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail2.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-89:
-----------------------------------

    Attachment: tail3.patch

Merged patch with latest trunk. Konstantin's view was that user's might get confused if they can view data in a file that gets thrown out if the namenode restarts. Keeping with his view, i have removed the "tail" command from this patch.

A follow-on patch will introduce a new API "flush" that will persist blocks before file is closed. 

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: tail3.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-89) files are not visible until they are closed

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528910 ] 

dhruba borthakur commented on HADOOP-89:
----------------------------------------

The contrib test failure si nto related to this patch, being tracked by HADOOP-1924

> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>             Fix For: 0.15.0
>
>         Attachments: tail.patch, tail3.patch, tail4.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.