You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Yasin Celik <ya...@gmail.com> on 2016/08/10 16:11:17 UTC

Using HDFS DataNode as a part of a project, where to start

Hello All,

I working on a P2P storage project for research purpose.
I want to use HDFS DataNode as a part of a research project.
One possibility is using only DataNode as a storage engine and do
everything else at upper level. In this case I will have all the metadata
management and replication mechanism at upper level and use DataNode only
for storing data per node.

The second possibility is using also NameNode for metadata management and
modify it to fit in my project.

I have been trying to find where to start. How much modularity is there in
HDFS?
Can I use only DataNode alone and modify it to fit in my project? What are
inputs and outputs of DataNode? Where should I start?

If I decide to use also NameNode, where should start?

Any comment/help is appreciated.

Thanks

Yasin Celik

Re: Using HDFS DataNode as a part of a project, where to start

Posted by Yasin Celik <ya...@gmail.com>.
Hello Ravi,

Thank you for your response. I have another question:

I am trying to trace a call from org.apache.hadoop.fs.FsShell to NameNode.
I am running a simple basic "ls" to understand how the mechanism works.
What I want to know is which classes are used on the way when the "ls" is
computed. To understand the mechanism, I print some messages on the way.
Below is my prints. I get lost from line 34 to 35.
I also could not find how DFSClient, DistributedFileSystem and FileSystem
are being used on this way when running "ls". Any comment/help is
appreciated. I basically want to know the call sequence of a basic shell
command "ls".

Thanks
Yasin


1     -     FsShell.main: [-ls, /test]
2     -     FsShell.run: [-ls, /test]
3     -     FsShell.init 1
4     -     FsShell.init 2
5     -     FsShell.registerCommands
6     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.FsCommand
7     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.AclCommands
8     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.CopyCommands
9     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Count
10     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Delete
11     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Display
12     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.find.Find
13     -     CommandFactory.registerCommands
org.apache.hadoop.fs.FsShellPermissions
14     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.FsUsage
15     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Ls
16     -     LS.registerCommands:
org.apache.hadoop.fs.shell.CommandFactory@51c8530f
17     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Mkdir
18     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.MoveCommands
19     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.SetReplication
20     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Stat
21     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Tail
22     -     CommandFactory.registerCommands org.apache.hadoop.fs.shell.Test
23     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Touch
24     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.Truncate
25     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.SnapshotCommands
26     -     CommandFactory.registerCommands
org.apache.hadoop.fs.shell.XAttrCommands
27     -     FsShell.run:cmd: -ls
28     -     FsShell.run:Command Name: ls
29     -     FsShell.run: Started:**********************************
org.apache.hadoop.fs.shell.Ls@3bd94634
30     -     Command.run
31     -     LS.processOptions: args[0]: /test
32     -     Command.run. After ProcessOptions
33     -     Command.processRawArguments
34     -     Command.expandArguments: /test
35     -     UserGroupInformation.getCurrentUser()
36     -     UserGroupInformation.getLoginUser() 1
37     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.ensureInitialized() 1
38     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 1
39     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 2
40     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 3
41     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.initialize: 4
42     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.ensureInitialized() 2
43     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 1 yasin (auth:null)
44     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 2
45     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 3
46     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 4
47     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.spawnAutoRenewalThreadForUserCreds() 1
48     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 7
49     -     16/09/17 14:34:05 INFO security.UserGroupInformation:
UserGroupInformation.loginUserFromSubject: 8
50     -     UserGroupInformation.getLoginUser() 2
51     -     UserGroupInformation.getCurrentUser()
52     -     DistributedFileSystem.DistributedFileSystem()
53     -     DistributedFileSystem.DistributedFileSystem()
54     -     FileSystem.createFileSystemhdfs://localhost:54310
55     -     FileSystem.initializehdfs://localhost:54310
56     -     DistributedFileSystem.initialize hdfs://localhost:54310
57     -     UserGroupInformation.getCurrentUser()
58     -     DFSClient.DFSClient 2:
59     -     NamenodeProxies.creatProxy :
60     -     UserGroupInformation.getCurrentUser()
61     -     NamenodeProxies.creatNNProxyWithClientProtocol:localhost/
127.0.0.1:54310
62     -     DFSClient.DFSClient 1:
63     -     Command.processArguments 1
64     -     Command.processArguments /test
65     -     Command.processArgument: item: /test
66     -     Command.recursePath: /test
67     -     DistributedFileSystem.listStatusInternal:Path:test
68     -     DFSCleint.listpaths: 1 /test
69     -     DFSCleint.listpaths: 2 /test
70     -     Found 2 items
71     -     LS.processPaths items:
[Lorg.apache.hadoop.fs.shell.PathData;@75f95314
72     -     Command.processPaths: item: /test/output
73     -     =====LS.processPath:item: /test/output
74     -     =====LS.processPath Before out: drwxr-xr-x   - yasin
supergroup          0 2016-09-09 11:40 /test/output
75     -     drwxr-xr-x   - yasin supergroup          0 2016-09-09 11:40
/test/output
76     -     =====LS.processPath After out: drwxr-xr-x   - yasin
supergroup          0 2016-09-09 11:40 /test/output
77     -     Command.processPaths: item: /test/text2gb.txt
78     -     =====LS.processPath:item: /test/text2gb.txt
79     -     =====LS.processPath Before out: -rw-r--r--   3 yasin
supergroup 2002096300 2016-09-06 17:04 /test/text2gb.txt
80     -     -rw-r--r--   3 yasin supergroup 2002096300 2016-09-06 17:04
/test/text2gb.txt
81     -     =====LS.processPath After out: -rw-r--r--   3 yasin supergroup
2002096300 2016-09-06 17:04 /test/text2gb.txt
82     -     Command.run. After processRawArguments
83     -     FsShell.run: STOP:**********************************


Yasin Celik

On Wed, Aug 10, 2016 at 1:32 PM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Yasin!
>
> Without knowing more about your project, here are answers to your
> questions.
>
> It's trivially easy to start only the Datanode. The HDFS code is very
> modular. https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-
> project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/
> server/datanode/DataNode.java . https://github.com/apache/
> hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop
> is a script you can use.
>
> Obviously though the Datanode will try to talk to a Namenode via the
> Namenode RPC mechanism. https://github.com/apache/
> hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/
> java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java
>
> If you wanted to modify the Namenode, here's the RPC interface it exports
> : https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-
> project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/
> NameNodeRpcServer.java .
>
> Good luck with your project!
> HTH
> Ravi
>
> On Wed, Aug 10, 2016 at 9:11 AM, Yasin Celik <ya...@gmail.com>
> wrote:
>
>> Hello All,
>>
>> I working on a P2P storage project for research purpose.
>> I want to use HDFS DataNode as a part of a research project.
>> One possibility is using only DataNode as a storage engine and do
>> everything else at upper level. In this case I will have all the metadata
>> management and replication mechanism at upper level and use DataNode only
>> for storing data per node.
>>
>> The second possibility is using also NameNode for metadata management and
>> modify it to fit in my project.
>>
>> I have been trying to find where to start. How much modularity is there in
>> HDFS?
>> Can I use only DataNode alone and modify it to fit in my project? What are
>> inputs and outputs of DataNode? Where should I start?
>>
>> If I decide to use also NameNode, where should start?
>>
>> Any comment/help is appreciated.
>>
>> Thanks
>>
>> Yasin Celik
>>
>
>

Re: Using HDFS DataNode as a part of a project, where to start

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Yasin!

Without knowing more about your project, here are answers to your
questions.

It's trivially easy to start only the Datanode. The HDFS code is very
modular.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
.
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop
is a script you can use.

Obviously though the Datanode will try to talk to a Namenode via the
Namenode RPC mechanism.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java

If you wanted to modify the Namenode, here's the RPC interface it exports :
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
.

Good luck with your project!
HTH
Ravi

On Wed, Aug 10, 2016 at 9:11 AM, Yasin Celik <ya...@gmail.com> wrote:

> Hello All,
>
> I working on a P2P storage project for research purpose.
> I want to use HDFS DataNode as a part of a research project.
> One possibility is using only DataNode as a storage engine and do
> everything else at upper level. In this case I will have all the metadata
> management and replication mechanism at upper level and use DataNode only
> for storing data per node.
>
> The second possibility is using also NameNode for metadata management and
> modify it to fit in my project.
>
> I have been trying to find where to start. How much modularity is there in
> HDFS?
> Can I use only DataNode alone and modify it to fit in my project? What are
> inputs and outputs of DataNode? Where should I start?
>
> If I decide to use also NameNode, where should start?
>
> Any comment/help is appreciated.
>
> Thanks
>
> Yasin Celik
>