You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2008/04/09 08:11:24 UTC

[jira] Commented: (HADOOP-3069) A failure on SecondaryNameNode truncates the primary NameNode image.

    [ https://issues.apache.org/jira/browse/HADOOP-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587061#action_12587061 ] 

dhruba borthakur commented on HADOOP-3069:
------------------------------------------

+1. Code change looks good. There are a few white-space-change-only lines in the patch though.

> A failure on SecondaryNameNode truncates the primary NameNode image.
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3069
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3069
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.16.3
>
>         Attachments: TruncatePrimaryImageBug.patch
>
>
> When the primary name-node pulls the new image from the secondary, 
> and the transfer fails for some reason then the primary considers the new image, 
> which may not be completely transfered yet or may be not transfered at all, 
> as a valid one and will roll it into the new files system image, which will be either corrupted or empty.
> The problem here is that the error message from the secondary node does not reach the primary.
> And this happens because TransferFsImage.getFileServer() closes the connection output stream 
> in its finalize section. The secondary later sends the error reply which cannot be received by the primary
> and causes the following exception on the secondary:
> {code}
> 08/03/21 12:16:52 ERROR NameNode.Secondary: java.io.FileNotFoundException: \hadoop-data\hdfs\namesecondary\destimage.tmp (The system cannot find the file specified)
> 08/03/21 12:16:56 WARN /: /getimage?getimage=1: 
> java.lang.IllegalStateException: Committed
> 	at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
> 	at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
> 	at org.apache.hadoop.dfs.SecondaryNameNode$GetImageServlet.doGet(SecondaryNameNode.java:485)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> 	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
> 	at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
> 	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> 	at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
> 	at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> 	at org.mortbay.http.HttpServer.service(HttpServer.java:954)
> 	at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
> 	at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
> 	at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
> 	at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
> 	at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
> 	at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> {code}
> But the exception does not effect the behavior of the primary node. Since the stream is closed the primary thinks 
> the file transfer was successfully finished and acts further accordingly.
> There 2 bugs that need to be fixed here.
> # The error message should be delivered to the primary, and the primary should not corrupt its image in case of an error.
> # The doGet() method of both HttpServlet-s should catch not only IOException-s but any exceptions. 
> If we miss NPE or SecurityException the main image will truncated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.