You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Yasin Celik <ya...@gmail.com> on 2016/09/17 18:44:03 UTC

Re: Using HDFS DataNode as a part of a project, where to start

Hello Ravi,

Thank you for your response. I have another question:

I am trying to trace a call from org.apache.hadoop.fs.FsShell to NameNode.
I am running a simple basic "ls" to understand how the mechanism works.
What I want to know is which classes are used on the way when the "ls" is
computed. To understand the mechanism, I print some messages on the way.
Below is my prints. I get lost from line 34 to 35.
I also could not find how DFSClient, DistributedFileSystem and FileSystem
are being used on this way when running "ls". Any comment/help is
appreciated. I basically want to know the call sequence of a basic shell
command "ls".

Thanks
Yasin


1     -     FsShell.main: [-ls, /test]
2     -     FsShell.run: [-ls, /test]
3     -     FsShell.init 1
4     -     FsShell.init 2
5     -     FsShell.registerCommands
6     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.FsCommand
7     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.AclCommands
8     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.CopyCommands
9     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Count
10     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Delete
11     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Display
12     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.find.Find
13     -     CommandFactory.registerCommands
org.apache.hadoop.fs.FsShellPermissions
14     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.FsUsage
15     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Ls
16     -     LS.registerCommands:
org.apache.hadoop.fs.shell.CommandFactory@51c8530f
17     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Mkdir
18     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.MoveCommands
19     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.SetReplication
20     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Stat
21     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Tail
22     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Test
23     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Touch
24     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Truncate
25     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.SnapshotCommands
26     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.XAttrCommands
27     -     FsShell.run:cmd: -ls
28     -     FsShell.run:Command Name: ls
29     -     FsShell.run: Started:**********************************
org.apache.hadoop.fs.shell.Ls@3bd94634
30     -     Command.run
31     -     LS.processOptions: args[0]: /test
32     -     Command.run. After ProcessOptions
33     -     Command.processRawArguments
34     -     Command.expandArguments: /test
35     -     UserGroupInformation.getCurrentUser()
36     -     UserGroupInformation.getLoginUser() 1
37     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.ensureInitialized() 1
38     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 1
39     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 2
40     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 3
41     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 4
42     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.ensureInitialized() 2
43     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 1 yasin (auth:null)
44     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 2
45     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 3
46     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 4
47     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.spawnAutoRenewalThreadForUserCreds() 1
48     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 7
49     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 8
50     -     UserGroupInformation.getLoginUser() 2
51     -     UserGroupInformation.getCurrentUser()
52     -     DistributedFileSystem.DistributedFileSystem()
53     -     DistributedFileSystem.DistributedFileSystem()
54     -     FileSystem.createFileSystemhdfs://localhost:54310
55     -     FileSystem.initializehdfs://localhost:54310
56     -     DistributedFileSystem.initialize hdfs://localhost:54310
57     -     UserGroupInformation.getCurrentUser()
58     -     DFSClient.DFSClient 2:
59     -     NamenodeProxies.creatProxy :
60     -     UserGroupInformation.getCurrentUser()
61     -     NamenodeProxies.creatNNProxyWithClientProtocol:localhost/
127.0.0.1:54310
62     -     DFSClient.DFSClient 1:
63     -     Command.processArguments 1
64     -     Command.processArguments /test
65     -     Command.processArgument: item: /test
66     -     Command.recursePath: /test
67     -     DistributedFileSystem.listStatusInternal:Path:test
68     -     DFSCleint.listpaths: 1 /test
69     -     DFSCleint.listpaths: 2 /test
70     -     Found 2 items
71     -     LS.processPaths items:
[Lorg.apache.hadoop.fs.shell.PathData;@75f95314
72     -     Command.processPaths: item: /test/output
73     -     =====LS.processPath:item: /test/output
74     -     =====LS.processPath Before out: drwxr-xr-x   - yasin
supergroup          0 2016-09-09 11:40 /test/output
75     -     drwxr-xr-x   - yasin supergroup          0 2016-09-09 11:40
/test/output
76     -     =====LS.processPath After out: drwxr-xr-x   - yasin
supergroup          0 2016-09-09 11:40 /test/output
77     -     Command.processPaths: item: /test/text2gb.txt
78     -     =====LS.processPath:item: /test/text2gb.txt
79     -     =====LS.processPath Before out: -rw-r--r--   3 yasin
supergroup 2002096300 2016-09-06 17:04 /test/text2gb.txt
80     -     -rw-r--r--   3 yasin supergroup 2002096300 2016-09-06 17:04
/test/text2gb.txt
81     -     =====LS.processPath After out: -rw-r--r--   3 yasin supergroup
2002096300 2016-09-06 17:04 /test/text2gb.txt
82     -     Command.run. After processRawArguments
83     -     FsShell.run: STOP:**********************************


Yasin Celik

On Wed, Aug 10, 2016 at 1:32 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Yasin!
>
> Without knowing more about your project, here are answers to your
> questions.
>
> It's trivially easy to start only the Datanode. The HDFS code is very
> modular. https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-
> project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/
> server/datanode/DataNode.java . https://github.com/apache/
> hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop
> is a script you can use.
>
> Obviously though the Datanode will try to talk to a Namenode via the
> Namenode RPC mechanism. https://github.com/apache/
> hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/
> java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
>
> If you wanted to modify the Namenode, here's the RPC interface it exports
> : https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-
> project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/
> NameNodeRpcServer.java .
>
> Good luck with your project!
> HTH
> Ravi
>
> On Wed, Aug 10, 2016 at 9:11 AM, Yasin Celik <ya...@gmail.com>
> wrote:
>
>> Hello All,
>>
>> I working on a P2P storage project for research purpose.
>> I want to use HDFS DataNode as a part of a research project.
>> One possibility is using only DataNode as a storage engine and do
>> everything else at upper level. In this case I will have all the metadata
>> management and replication mechanism at upper level and use DataNode only
>> for storing data per node.
>>
>> The second possibility is using also NameNode for metadata management and
>> modify it to fit in my project.
>>
>> I have been trying to find where to start. How much modularity is there in
>> HDFS?
>> Can I use only DataNode alone and modify it to fit in my project? What are
>> inputs and outputs of DataNode? Where should I start?
>>
>> If I decide to use also NameNode, where should start?
>>
>> Any comment/help is appreciated.
>>
>> Thanks
>>
>> Yasin Celik
>>
>
>