You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Alberich de megres <al...@gmail.com> on 2010/04/01 11:50:04 UTC

HDFS Blockreport question

Hi everyone!

sailing throught the hdfs source code that comes with hadoop 0.20.2, i
could not understand how hdfs sends blockreport to nameNode.

As i can see, in
src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
create this.namenode interface with RPC.waitForProxy call (wich i
could not understand which class it instantiates, and how it works).

After that, datanode generates block list report (blockListAsLongs)
with data.getBlockReport, and call this.namenode.blockReport(..),
inside namenode.blockReport it calls again namesystem.processReport.
This leads to an update of block lists inside nameserver.

But how it sends over the network this blockreport?

Anyone can point me some light?

thanks for all!
(and sorry for the newbie question)

Alberich

Re: HDFS Blockreport question

Posted by Alberich de megres <al...@gmail.com>.
Sorry for the late answear
But thanks for the tips guys!!!

Now the hard work i think would be try to understand HDFS wire
protocol on ipc package.

thanks!!


On Tue, Apr 6, 2010 at 4:50 PM, Brian Bockelman <bb...@cse.unl.edu> wrote:
> Hey Jay,
>
> I think, if you're experienced in implementing transfer protocols, it is not difficult to implement the HDFS wire protocol.  As you point out, they are subject to change between releases (especially between 0.20, 0.21, and 0.22) and basically documented in fragments in the java source code.  At least, I looked at doing this for the read portions, and it wasn't horrible.
>
> However, the *really hard part* is the client retry/recovery logic.  That's where a lot of the intelligence is, in very large classes, and not incredibly well-documented.
>
> I've had lots of luck with scaling libhdfs - we average >20TB / day and billions of I/O operations a day with it.  I'd strongly advise not re-inventing the wheel, unless it's for a research project.
>
> Brian
>
> On Apr 6, 2010, at 8:53 AM, Jay Booth wrote:
>
>> A pure C library to communicate with HDFS?
>>
>> Certainly possible, but it would be a lot of work, and the HDFS wire
>> protocols are ad hoc, only somewhat documented and subject to change between
>> releases right now so you'd be chasing a moving target.  I'd try to think of
>> another way to accomplish what you want to do before attempting a client
>> reimplementation in C right now..  if you only need to talk to the namenode
>> and not the datanodes it might be a little easier but still, lots of work
>> that will probably be obsolete after another release or two.
>>
>>
>> On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres <al...@gmail.com>wrote:
>>
>>> Thanks!
>>>
>>> I'm already using eclipse to browse the code.
>>> In this scenario, i could understand that java serializes the object
>>> through the network and its parameters.  is that ok?
>>>
>>> For example, if i want to make a pure C library (with no JNI
>>> interfaces).. is it possible/feasible? or it will be like to freeze
>>> the hell?
>>>
>>> Thanks once again!!!
>>>
>>>
>>> On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson <ry...@gmail.com> wrote:
>>>> If you look at the getProxy code it passes an "Invoker" (or something
>>>> like that) which the proxy code uses to delegate calls TO.  The
>>>> Invoker will call another class "Client" which has sub-classes like
>>>> Call, and Connection which wrap the actual java IO.  This all lives in
>>>> the org.apache.hadoop.ipc package.
>>>>
>>>> Be sure to use a good IDE like IJ or Eclipse to browse the code, it
>>>> makes following all this stuff much easier.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
>>>> <al...@gmail.com> wrote:
>>>>> Hi again!
>>>>>
>>>>> Anyone could help me?
>>>>> I could not understand how RPC class works. For me, only tries to
>>>>> instantiates a single interfaces with no declaration for some methods
>>>>> like blockreport. But then it uses rpc.getproxy to get new class wich
>>>>> send messages with name node.
>>>>>
>>>>> I'm sorry for this silly question, but i am really lost at this point.
>>>>>
>>>>> Thanks for the patience.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
>>>>> <al...@gmail.com> wrote:
>>>>>> Hi Jay!
>>>>>>
>>>>>> thanks for the answear but i'm asking for what it works it sends?
>>>>>> blockreport is an interface in DatanodeProtocol that has no
>>>>>> declaration.
>>>>>>
>>>>>> thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <ja...@gmail.com> wrote:
>>>>>>> In DataNode:
>>>>>>> public DatanodeProtocol namenode
>>>>>>>
>>>>>>> It's not a reference to an actual namenode, it's a wrapper for a
>>> network
>>>>>>> protocol created by that RPC.waitForProxy call -- so when it calls
>>>>>>> namenode.blockReport, it's sending that information over RPC to the
>>> namenode
>>>>>>> instance over the network
>>>>>>>
>>>>>>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <
>>> alberich2k5@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi everyone!
>>>>>>>>
>>>>>>>> sailing throught the hdfs source code that comes with hadoop 0.20.2,
>>> i
>>>>>>>> could not understand how hdfs sends blockreport to nameNode.
>>>>>>>>
>>>>>>>> As i can see, in
>>>>>>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
>>>>>>>> create this.namenode interface with RPC.waitForProxy call (wich i
>>>>>>>> could not understand which class it instantiates, and how it works).
>>>>>>>>
>>>>>>>> After that, datanode generates block list report (blockListAsLongs)
>>>>>>>> with data.getBlockReport, and call this.namenode.blockReport(..),
>>>>>>>> inside namenode.blockReport it calls again namesystem.processReport.
>>>>>>>> This leads to an update of block lists inside nameserver.
>>>>>>>>
>>>>>>>> But how it sends over the network this blockreport?
>>>>>>>>
>>>>>>>> Anyone can point me some light?
>>>>>>>>
>>>>>>>> thanks for all!
>>>>>>>> (and sorry for the newbie question)
>>>>>>>>
>>>>>>>> Alberich
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>

Re: HDFS Blockreport question

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Jay,

I think, if you're experienced in implementing transfer protocols, it is not difficult to implement the HDFS wire protocol.  As you point out, they are subject to change between releases (especially between 0.20, 0.21, and 0.22) and basically documented in fragments in the java source code.  At least, I looked at doing this for the read portions, and it wasn't horrible.

However, the *really hard part* is the client retry/recovery logic.  That's where a lot of the intelligence is, in very large classes, and not incredibly well-documented.

I've had lots of luck with scaling libhdfs - we average >20TB / day and billions of I/O operations a day with it.  I'd strongly advise not re-inventing the wheel, unless it's for a research project.

Brian

On Apr 6, 2010, at 8:53 AM, Jay Booth wrote:

> A pure C library to communicate with HDFS?
> 
> Certainly possible, but it would be a lot of work, and the HDFS wire
> protocols are ad hoc, only somewhat documented and subject to change between
> releases right now so you'd be chasing a moving target.  I'd try to think of
> another way to accomplish what you want to do before attempting a client
> reimplementation in C right now..  if you only need to talk to the namenode
> and not the datanodes it might be a little easier but still, lots of work
> that will probably be obsolete after another release or two.
> 
> 
> On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres <al...@gmail.com>wrote:
> 
>> Thanks!
>> 
>> I'm already using eclipse to browse the code.
>> In this scenario, i could understand that java serializes the object
>> through the network and its parameters.  is that ok?
>> 
>> For example, if i want to make a pure C library (with no JNI
>> interfaces).. is it possible/feasible? or it will be like to freeze
>> the hell?
>> 
>> Thanks once again!!!
>> 
>> 
>> On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson <ry...@gmail.com> wrote:
>>> If you look at the getProxy code it passes an "Invoker" (or something
>>> like that) which the proxy code uses to delegate calls TO.  The
>>> Invoker will call another class "Client" which has sub-classes like
>>> Call, and Connection which wrap the actual java IO.  This all lives in
>>> the org.apache.hadoop.ipc package.
>>> 
>>> Be sure to use a good IDE like IJ or Eclipse to browse the code, it
>>> makes following all this stuff much easier.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
>>> <al...@gmail.com> wrote:
>>>> Hi again!
>>>> 
>>>> Anyone could help me?
>>>> I could not understand how RPC class works. For me, only tries to
>>>> instantiates a single interfaces with no declaration for some methods
>>>> like blockreport. But then it uses rpc.getproxy to get new class wich
>>>> send messages with name node.
>>>> 
>>>> I'm sorry for this silly question, but i am really lost at this point.
>>>> 
>>>> Thanks for the patience.
>>>> 
>>>> 
>>>> 
>>>> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
>>>> <al...@gmail.com> wrote:
>>>>> Hi Jay!
>>>>> 
>>>>> thanks for the answear but i'm asking for what it works it sends?
>>>>> blockreport is an interface in DatanodeProtocol that has no
>>>>> declaration.
>>>>> 
>>>>> thanks!
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <ja...@gmail.com> wrote:
>>>>>> In DataNode:
>>>>>> public DatanodeProtocol namenode
>>>>>> 
>>>>>> It's not a reference to an actual namenode, it's a wrapper for a
>> network
>>>>>> protocol created by that RPC.waitForProxy call -- so when it calls
>>>>>> namenode.blockReport, it's sending that information over RPC to the
>> namenode
>>>>>> instance over the network
>>>>>> 
>>>>>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <
>> alberich2k5@gmail.com>wrote:
>>>>>> 
>>>>>>> Hi everyone!
>>>>>>> 
>>>>>>> sailing throught the hdfs source code that comes with hadoop 0.20.2,
>> i
>>>>>>> could not understand how hdfs sends blockreport to nameNode.
>>>>>>> 
>>>>>>> As i can see, in
>>>>>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
>>>>>>> create this.namenode interface with RPC.waitForProxy call (wich i
>>>>>>> could not understand which class it instantiates, and how it works).
>>>>>>> 
>>>>>>> After that, datanode generates block list report (blockListAsLongs)
>>>>>>> with data.getBlockReport, and call this.namenode.blockReport(..),
>>>>>>> inside namenode.blockReport it calls again namesystem.processReport.
>>>>>>> This leads to an update of block lists inside nameserver.
>>>>>>> 
>>>>>>> But how it sends over the network this blockreport?
>>>>>>> 
>>>>>>> Anyone can point me some light?
>>>>>>> 
>>>>>>> thanks for all!
>>>>>>> (and sorry for the newbie question)
>>>>>>> 
>>>>>>> Alberich
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: HDFS Blockreport question

Posted by Jay Booth <ja...@gmail.com>.
A pure C library to communicate with HDFS?

Certainly possible, but it would be a lot of work, and the HDFS wire
protocols are ad hoc, only somewhat documented and subject to change between
releases right now so you'd be chasing a moving target.  I'd try to think of
another way to accomplish what you want to do before attempting a client
reimplementation in C right now..  if you only need to talk to the namenode
and not the datanodes it might be a little easier but still, lots of work
that will probably be obsolete after another release or two.


On Tue, Apr 6, 2010 at 9:47 AM, Alberich de megres <al...@gmail.com>wrote:

> Thanks!
>
> I'm already using eclipse to browse the code.
> In this scenario, i could understand that java serializes the object
> through the network and its parameters.  is that ok?
>
> For example, if i want to make a pure C library (with no JNI
> interfaces).. is it possible/feasible? or it will be like to freeze
> the hell?
>
> Thanks once again!!!
>
>
> On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson <ry...@gmail.com> wrote:
> > If you look at the getProxy code it passes an "Invoker" (or something
> > like that) which the proxy code uses to delegate calls TO.  The
> > Invoker will call another class "Client" which has sub-classes like
> > Call, and Connection which wrap the actual java IO.  This all lives in
> > the org.apache.hadoop.ipc package.
> >
> > Be sure to use a good IDE like IJ or Eclipse to browse the code, it
> > makes following all this stuff much easier.
> >
> >
> >
> >
> >
> > On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
> > <al...@gmail.com> wrote:
> >> Hi again!
> >>
> >> Anyone could help me?
> >> I could not understand how RPC class works. For me, only tries to
> >> instantiates a single interfaces with no declaration for some methods
> >> like blockreport. But then it uses rpc.getproxy to get new class wich
> >> send messages with name node.
> >>
> >> I'm sorry for this silly question, but i am really lost at this point.
> >>
> >> Thanks for the patience.
> >>
> >>
> >>
> >> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
> >> <al...@gmail.com> wrote:
> >>> Hi Jay!
> >>>
> >>> thanks for the answear but i'm asking for what it works it sends?
> >>> blockreport is an interface in DatanodeProtocol that has no
> >>> declaration.
> >>>
> >>> thanks!
> >>>
> >>>
> >>>
> >>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <ja...@gmail.com> wrote:
> >>>> In DataNode:
> >>>> public DatanodeProtocol namenode
> >>>>
> >>>> It's not a reference to an actual namenode, it's a wrapper for a
> network
> >>>> protocol created by that RPC.waitForProxy call -- so when it calls
> >>>> namenode.blockReport, it's sending that information over RPC to the
> namenode
> >>>> instance over the network
> >>>>
> >>>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <
> alberich2k5@gmail.com>wrote:
> >>>>
> >>>>> Hi everyone!
> >>>>>
> >>>>> sailing throught the hdfs source code that comes with hadoop 0.20.2,
> i
> >>>>> could not understand how hdfs sends blockreport to nameNode.
> >>>>>
> >>>>> As i can see, in
> >>>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
> >>>>> create this.namenode interface with RPC.waitForProxy call (wich i
> >>>>> could not understand which class it instantiates, and how it works).
> >>>>>
> >>>>> After that, datanode generates block list report (blockListAsLongs)
> >>>>> with data.getBlockReport, and call this.namenode.blockReport(..),
> >>>>> inside namenode.blockReport it calls again namesystem.processReport.
> >>>>> This leads to an update of block lists inside nameserver.
> >>>>>
> >>>>> But how it sends over the network this blockreport?
> >>>>>
> >>>>> Anyone can point me some light?
> >>>>>
> >>>>> thanks for all!
> >>>>> (and sorry for the newbie question)
> >>>>>
> >>>>> Alberich
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: HDFS Blockreport question

Posted by Alberich de megres <al...@gmail.com>.
Thanks!

I'm already using eclipse to browse the code.
In this scenario, i could understand that java serializes the object
through the network and its parameters.  is that ok?

For example, if i want to make a pure C library (with no JNI
interfaces).. is it possible/feasible? or it will be like to freeze
the hell?

Thanks once again!!!


On Sat, Apr 3, 2010 at 1:54 AM, Ryan Rawson <ry...@gmail.com> wrote:
> If you look at the getProxy code it passes an "Invoker" (or something
> like that) which the proxy code uses to delegate calls TO.  The
> Invoker will call another class "Client" which has sub-classes like
> Call, and Connection which wrap the actual java IO.  This all lives in
> the org.apache.hadoop.ipc package.
>
> Be sure to use a good IDE like IJ or Eclipse to browse the code, it
> makes following all this stuff much easier.
>
>
>
>
>
> On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
> <al...@gmail.com> wrote:
>> Hi again!
>>
>> Anyone could help me?
>> I could not understand how RPC class works. For me, only tries to
>> instantiates a single interfaces with no declaration for some methods
>> like blockreport. But then it uses rpc.getproxy to get new class wich
>> send messages with name node.
>>
>> I'm sorry for this silly question, but i am really lost at this point.
>>
>> Thanks for the patience.
>>
>>
>>
>> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
>> <al...@gmail.com> wrote:
>>> Hi Jay!
>>>
>>> thanks for the answear but i'm asking for what it works it sends?
>>> blockreport is an interface in DatanodeProtocol that has no
>>> declaration.
>>>
>>> thanks!
>>>
>>>
>>>
>>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <ja...@gmail.com> wrote:
>>>> In DataNode:
>>>> public DatanodeProtocol namenode
>>>>
>>>> It's not a reference to an actual namenode, it's a wrapper for a network
>>>> protocol created by that RPC.waitForProxy call -- so when it calls
>>>> namenode.blockReport, it's sending that information over RPC to the namenode
>>>> instance over the network
>>>>
>>>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <al...@gmail.com>wrote:
>>>>
>>>>> Hi everyone!
>>>>>
>>>>> sailing throught the hdfs source code that comes with hadoop 0.20.2, i
>>>>> could not understand how hdfs sends blockreport to nameNode.
>>>>>
>>>>> As i can see, in
>>>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
>>>>> create this.namenode interface with RPC.waitForProxy call (wich i
>>>>> could not understand which class it instantiates, and how it works).
>>>>>
>>>>> After that, datanode generates block list report (blockListAsLongs)
>>>>> with data.getBlockReport, and call this.namenode.blockReport(..),
>>>>> inside namenode.blockReport it calls again namesystem.processReport.
>>>>> This leads to an update of block lists inside nameserver.
>>>>>
>>>>> But how it sends over the network this blockreport?
>>>>>
>>>>> Anyone can point me some light?
>>>>>
>>>>> thanks for all!
>>>>> (and sorry for the newbie question)
>>>>>
>>>>> Alberich
>>>>>
>>>>
>>>
>>
>

Re: HDFS Blockreport question

Posted by Ryan Rawson <ry...@gmail.com>.
If you look at the getProxy code it passes an "Invoker" (or something
like that) which the proxy code uses to delegate calls TO.  The
Invoker will call another class "Client" which has sub-classes like
Call, and Connection which wrap the actual java IO.  This all lives in
the org.apache.hadoop.ipc package.

Be sure to use a good IDE like IJ or Eclipse to browse the code, it
makes following all this stuff much easier.





On Fri, Apr 2, 2010 at 4:39 PM, Alberich de megres
<al...@gmail.com> wrote:
> Hi again!
>
> Anyone could help me?
> I could not understand how RPC class works. For me, only tries to
> instantiates a single interfaces with no declaration for some methods
> like blockreport. But then it uses rpc.getproxy to get new class wich
> send messages with name node.
>
> I'm sorry for this silly question, but i am really lost at this point.
>
> Thanks for the patience.
>
>
>
> On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
> <al...@gmail.com> wrote:
>> Hi Jay!
>>
>> thanks for the answear but i'm asking for what it works it sends?
>> blockreport is an interface in DatanodeProtocol that has no
>> declaration.
>>
>> thanks!
>>
>>
>>
>> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <ja...@gmail.com> wrote:
>>> In DataNode:
>>> public DatanodeProtocol namenode
>>>
>>> It's not a reference to an actual namenode, it's a wrapper for a network
>>> protocol created by that RPC.waitForProxy call -- so when it calls
>>> namenode.blockReport, it's sending that information over RPC to the namenode
>>> instance over the network
>>>
>>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <al...@gmail.com>wrote:
>>>
>>>> Hi everyone!
>>>>
>>>> sailing throught the hdfs source code that comes with hadoop 0.20.2, i
>>>> could not understand how hdfs sends blockreport to nameNode.
>>>>
>>>> As i can see, in
>>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
>>>> create this.namenode interface with RPC.waitForProxy call (wich i
>>>> could not understand which class it instantiates, and how it works).
>>>>
>>>> After that, datanode generates block list report (blockListAsLongs)
>>>> with data.getBlockReport, and call this.namenode.blockReport(..),
>>>> inside namenode.blockReport it calls again namesystem.processReport.
>>>> This leads to an update of block lists inside nameserver.
>>>>
>>>> But how it sends over the network this blockreport?
>>>>
>>>> Anyone can point me some light?
>>>>
>>>> thanks for all!
>>>> (and sorry for the newbie question)
>>>>
>>>> Alberich
>>>>
>>>
>>
>

Re: HDFS Blockreport question

Posted by Alberich de megres <al...@gmail.com>.
Hi again!

Anyone could help me?
I could not understand how RPC class works. For me, only tries to
instantiates a single interfaces with no declaration for some methods
like blockreport. But then it uses rpc.getproxy to get new class wich
send messages with name node.

I'm sorry for this silly question, but i am really lost at this point.

Thanks for the patience.



On Fri, Apr 2, 2010 at 2:11 AM, Alberich de megres
<al...@gmail.com> wrote:
> Hi Jay!
>
> thanks for the answear but i'm asking for what it works it sends?
> blockreport is an interface in DatanodeProtocol that has no
> declaration.
>
> thanks!
>
>
>
> On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <ja...@gmail.com> wrote:
>> In DataNode:
>> public DatanodeProtocol namenode
>>
>> It's not a reference to an actual namenode, it's a wrapper for a network
>> protocol created by that RPC.waitForProxy call -- so when it calls
>> namenode.blockReport, it's sending that information over RPC to the namenode
>> instance over the network
>>
>> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <al...@gmail.com>wrote:
>>
>>> Hi everyone!
>>>
>>> sailing throught the hdfs source code that comes with hadoop 0.20.2, i
>>> could not understand how hdfs sends blockreport to nameNode.
>>>
>>> As i can see, in
>>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
>>> create this.namenode interface with RPC.waitForProxy call (wich i
>>> could not understand which class it instantiates, and how it works).
>>>
>>> After that, datanode generates block list report (blockListAsLongs)
>>> with data.getBlockReport, and call this.namenode.blockReport(..),
>>> inside namenode.blockReport it calls again namesystem.processReport.
>>> This leads to an update of block lists inside nameserver.
>>>
>>> But how it sends over the network this blockreport?
>>>
>>> Anyone can point me some light?
>>>
>>> thanks for all!
>>> (and sorry for the newbie question)
>>>
>>> Alberich
>>>
>>
>

Re: HDFS Blockreport question

Posted by Alberich de megres <al...@gmail.com>.
Hi Jay!

thanks for the answear but i'm asking for what it works it sends?
blockreport is an interface in DatanodeProtocol that has no
declaration.

thanks!



On Thu, Apr 1, 2010 at 5:50 PM, Jay Booth <ja...@gmail.com> wrote:
> In DataNode:
> public DatanodeProtocol namenode
>
> It's not a reference to an actual namenode, it's a wrapper for a network
> protocol created by that RPC.waitForProxy call -- so when it calls
> namenode.blockReport, it's sending that information over RPC to the namenode
> instance over the network
>
> On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <al...@gmail.com>wrote:
>
>> Hi everyone!
>>
>> sailing throught the hdfs source code that comes with hadoop 0.20.2, i
>> could not understand how hdfs sends blockreport to nameNode.
>>
>> As i can see, in
>> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
>> create this.namenode interface with RPC.waitForProxy call (wich i
>> could not understand which class it instantiates, and how it works).
>>
>> After that, datanode generates block list report (blockListAsLongs)
>> with data.getBlockReport, and call this.namenode.blockReport(..),
>> inside namenode.blockReport it calls again namesystem.processReport.
>> This leads to an update of block lists inside nameserver.
>>
>> But how it sends over the network this blockreport?
>>
>> Anyone can point me some light?
>>
>> thanks for all!
>> (and sorry for the newbie question)
>>
>> Alberich
>>
>

Re: HDFS Blockreport question

Posted by Jay Booth <ja...@gmail.com>.
In DataNode:
public DatanodeProtocol namenode

It's not a reference to an actual namenode, it's a wrapper for a network
protocol created by that RPC.waitForProxy call -- so when it calls
namenode.blockReport, it's sending that information over RPC to the namenode
instance over the network

On Thu, Apr 1, 2010 at 5:50 AM, Alberich de megres <al...@gmail.com>wrote:

> Hi everyone!
>
> sailing throught the hdfs source code that comes with hadoop 0.20.2, i
> could not understand how hdfs sends blockreport to nameNode.
>
> As i can see, in
> src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java we
> create this.namenode interface with RPC.waitForProxy call (wich i
> could not understand which class it instantiates, and how it works).
>
> After that, datanode generates block list report (blockListAsLongs)
> with data.getBlockReport, and call this.namenode.blockReport(..),
> inside namenode.blockReport it calls again namesystem.processReport.
> This leads to an update of block lists inside nameserver.
>
> But how it sends over the network this blockreport?
>
> Anyone can point me some light?
>
> thanks for all!
> (and sorry for the newbie question)
>
> Alberich
>