You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Andrzej Bialecki <ab...@getopt.org> on 2006/03/29 10:27:55 UTC
Datanode / namenode UUIDs (Re: "lost" NDFS blocks following network
reorg)
Stefan Groschupf wrote:
> Hi hadoop developers,
Hi,
My comments below. Generally speaking, I think you are right - datanodes
should be initialized with a UUID, created once and persisted across IP
changes and hostname changes, and this UUID should be used to identify
datanodes/namenodes. I think the "format" command should also be
implemented for datanodes, to create their UUID when starting for the
first time - and later on this UUID should be retrieved from the local
file somewhere in the data dir.
> The local name of the data node is machineName + ":" + tmpPort. So
> it can change if the port is blocked or the machine name change.
> May we should create the datanode only once and write it to the data
> folder to be able read it later on.(?)
>
> This local name is used to send block reports to the name node.
> FSNamesystem#processReport(Block newReport[], UTF8 dataNodeLocalName)
> process this report.
> In the first line of this method the DatanodeInfo is loaded by the
> dataNode's localName. The datanode already is in this map since a
> heart beat is send before a block report.
> So:
> DatanodeInfo node = (DatanodeInfo) datanodeMap.get(name); // no
> problem but just a 'empty' container:
> ...
> Block oldReport[] = node.getBlocks(); // will return null since no
> Blocks are yet associated with this node.
>
> Since oldReport is null all code is skipped until line 901. But this
> only adds the blocks to the node container.
Umm.. I don't follow. The lines 901-905 will add these blocks from the
newReport, because newPos == 0.
>
> In line 924 begins a section of code that collects all obsolete
> blocks. First of all I wondering why we iterate throw all blocks here,
> this could be expansice and it would be enough to iterate over all
> blocks that are reported by this datanode, isn't it?
> If a block is still valid is tested by FSDirectory#isValidBlock that
> checks if the block is in activeBlocks.
> The problem I see now is that the only method that adds Blocks t
> activeBlocks is unprotectedAddFile(UTF8 name, Block blocks[]). But
> here also the name node local name that may changed is involved.
> This method is also used to load the state of stopped or crashed name
> node.
> So in case you stop the dfs, change host names a set of blocks will be
> marked as obsolete and deleted.
I'm not 100% sure if this part is correct, but it makes me nervous, too,
to involve such ephemeric things as IP/hostname in handling data that
persists across IP/hostname changes...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com