You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2006/11/06 18:04:38 UTC

[jira] Commented: (HADOOP-656) dfs locking doesn't notify the application when a lock is lost

    [ http://issues.apache.org/jira/browse/HADOOP-656?page=comments#action_12447471 ] 
            
dhruba borthakur commented on HADOOP-656:
-----------------------------------------

I can reliably cause crc-corruption in the case when the lease timesout. The following scenario explains this:

The client renews his lease every 30 seconds. The namenode declares a client as 'dead' if it does not get a lease-renewal message in 60 seconds. The namenode then reclaims the datablocks for that file; these datablocks may now get allocated from another file.

If it so happens that a client gets delayed for more than 60 seconds in its lease renewal (due to network congestion, slow response from datanodes, etc. etc), then the namenode will experience a lease expiration and will reclaim the blocks for that file in question. The namenode may now allocate these blocks to a new file. This new file may start writing to this block. Meanwhile the original file-writer may continue to flush his data to the same block because it has not yet experienced a lease-timeout-exception.  This may lead to data corruption. 

Simulating the lease-expiration timeouts to occur immediately causes crc corruptions to show up.


> dfs locking doesn't notify the application when a lock is lost
> --------------------------------------------------------------
>
>                 Key: HADOOP-656
>                 URL: http://issues.apache.org/jira/browse/HADOOP-656
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.7.2
>            Reporter: Owen O'Malley
>         Assigned To: Sameer Paranjpye
>
> DFS locks may be lost for failing to renew the lease on time, but the application is not notified about the loss of the lock and may therefore perform operations assuming it has the lock, even though the lock has been given to another process. I propose that DFS operations check to see if that client has lost a lock since the last check and if so throw a LostLockException.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira