You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2007/10/15 23:36:50 UTC

[jira] Commented: (HADOOP-2044) Namenode encounters ClassCastException exceptions for INodeFileUnderConstruction

    [ https://issues.apache.org/jira/browse/HADOOP-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534985 ] 

Konstantin Shvachko commented on HADOOP-2044:
---------------------------------------------

The patch fixes 2 important bugs:
# access and modification of FSNamesystem.sortedLeases should be synchronized (under a lock, which is different from the global FSNamesystem lock).
The bug was that startFileInternal() modified sortedLeases  with the global lock but without the leases lock.
# completeFile() or getAdditionalBlock() cannot rely on that they always deal with a file under construction, because if client fails to remove the lease, 
the file is automatically converted into a regular file, and modifications of such files are not allowed.

I have 2 comments, which do not address the correctness of the patch but rather intended to make code more understandable.
# instead of using the "instaceof INodeFileUnderConstruction" operator it is better to define an virtual method INode.isFileUnderConstruction().
# All modifications of the FSNamesystem.leases member are performed under the global FSNamesystem lock.
While modifications of FSNamesystem.sortedLeases  are done under the lock associated with the FSNamesystem.leases, which
is correct but unintuitive. I'd propose to replace all 
{code}synchronized (leases) {}
{code}
sections by
{code}synchronized (sortedLeases) {}
{code}

> Namenode encounters ClassCastException exceptions for INodeFileUnderConstruction
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-2044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2044
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: InodeClassException.patch
>
>
> A distcp command running on one 400 node cluster shows this exception:
> org.apache.hadoop.fs.FSNamesystem: Removing lease [Lease.  Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 37 31 30 31 31 32 32 35 37 5f 30 30 30 33 5f 6d 5f 30 30 30 30 39 32 5f 30, heldlocks: 0, pendingcreates: 0], leases remaining: 736
> org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /user/xxxx/logs_21/_task_200710112257_0003_m_000027_0/part-00027 file does not exist.
>  org.apache.hadoop.fs.FSNamesystem: Removing lease [Lease.  Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 37 31 30 31 31 32 32 35 37 5f 30 30 30 33 5f 6d 5f 30 30 30 30 32 37 5f 30, heldlocks: 0, pendingcreates: 0], leases remaining: 735
>  org.apache.hadoop.fs.FSNamesystem: java.lang.ClassCastException: org.apache.hadoop.dfs.INodeFile cannot be cast to org.apache.hadoop.dfs.INodeFileUnderConstruction
>         at org.apache.hadoop.dfs.FSNamesystem.internalReleaseCreate(FSNamesystem.java:1566)
>         at org.apache.hadoop.dfs.FSNamesystem.access$100(FSNamesystem.java:51)
>         at org.apache.hadoop.dfs.FSNamesystem$Lease.releaseLocks(FSNamesystem.java:1463)
>         at org.apache.hadoop.dfs.FSNamesystem$LeaseMonitor.run(FSNamesystem.java:1525)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.