You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Demai Ni <ni...@gmail.com> on 2014/10/29 21:55:47 UTC

[HDFS] result order of getFileBlockLocations() and listFiles()?

hi, Guys,

I am trying to implement a simple program(that is not for production,
experimental). And invoke FileSystem.listFiles() to get a list of files
under a hdfs folder, and then use FileSystem.getFileBlockLocations() to get
replica locations of each file/blocks.

Since it is a controlled environment, I can make sure the files are static
and don't worry about datanode crash, fail-over, etc.

Assuming at a small time-window(say, 1 minute), I have 100~1000s client
invoke the same program to look up the same folder. Will the above two APIs
guarantee *same result in the same order* for all clients?

To elaborate a bit more, say there is a folder called /dfs/dn/user/data
contains three files: file1, file2, and file3.  If client1 gets:
listFiles() : file1,file2,file3
getFileBlockLocation(file1) -> datanode1, datanode3, datanode6

Will all other clients get the same information(I think so) and in the same
order?  or I have to do a sort by each client to guarantee the order?

Many thanks for your inputs

Demai