You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Grandl Robert <rg...@yahoo.com> on 2013/07/22 05:41:16 UTC

non-local map task input

Hi guys,

I am trying to figure out all the points in hdfs code where hdfs traffic is read/written. As far as I can tell, it seems most of the traffic goes through BlockSender/BlockReceiver, right ?

However, when a client do a copyFromLocal, or read a file, or for a map task whose input is not local, it seems the DFSClient is invoked. I understand that with DFSClient, it gets the dananodes locations from namenode and then directly open a socket and read/writes. Anyway, I am not very sure where that happens. Can someone point me out where in the code I can find the exact calls to read/write from other datanodes with DFSClient ?

Thanks in advance,
Robert

Re: non-local map task input

Posted by Colin Patrick McCabe <ra...@gmail.com>.
Try looking at DFSOutputStream / DFSInputStream.

Sent from a mobile device

Colin
On Jul 21, 2013 8:41 PM, "Grandl Robert" <rg...@yahoo.com> wrote:

> Hi guys,
>
> I am trying to figure out all the points in hdfs code where hdfs traffic
> is read/written. As far as I can tell, it seems most of the traffic goes
> through BlockSender/BlockReceiver, right ?
>
> However, when a client do a copyFromLocal, or read a file, or for a map
> task whose input is not local, it seems the DFSClient is invoked. I
> understand that with DFSClient, it gets the dananodes locations from
> namenode and then directly open a socket and read/writes. Anyway, I am not
> very sure where that happens. Can someone point me out where in the code I
> can find the exact calls to read/write from other datanodes with DFSClient ?
>
> Thanks in advance,
> Robert

Re: non-local map task input

Posted by Grandl Robert <rg...@yahoo.com>.
Can anyone help me with this please ?

Thanks,
Robert



________________________________
 From: Grandl Robert <rg...@yahoo.com>
To: "hdfs-dev@hadoop.apache.org" <hd...@hadoop.apache.org> 
Sent: Sunday, July 21, 2013 8:41 PM
Subject: non-local map task input
 

Hi guys,

I am trying to figure out all the points in hdfs code where hdfs traffic is read/written. As far as I can tell, it seems most of the traffic goes through BlockSender/BlockReceiver, right ?

However, when a client do a copyFromLocal, or read a file, or for a map task whose input is not local, it seems the DFSClient is invoked. I understand that with DFSClient, it gets the dananodes locations from namenode and then directly open a socket and read/writes. Anyway, I am not very sure where that happens. Can someone point me out where in the code I can find the exact calls to read/write from other datanodes with DFSClient ?

Thanks in advance,
Robert