You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Dhaivat Pandya <dh...@gmail.com> on 2014/04/07 03:09:10 UTC

Hadoop v1.8 data transfer protocol

Hi,

I'm trying to figure out how data is transferred between client and
DataNode in Hadoop v1.8.

This is my understanding so far:

The client first fires an OP_READ_BLOCK request. The DataNode responds with
a status code, checksum header, chunk offset, packet length, sequence
number, the last packet boolean, the length and the data (in that order).

However, I'm running into an issue. First of all, which of these lengths
describes the length of the data? I tried both PacketLength and Length it
seems that they leave data on the stream (I tried to "cat" a file with the
numbers 1-1000 in it).

Also, how does the DataNode signal the start of another packet? After
"Length" number of bytes have been read, I assumed that the header would be
repeated, but this is not the case (I'm not getting sane values for any of
the fields of the header).

I've looked through the DataXceiver, BlockSender, DFSClient
(RemoteBlockReader) classes but I still can't quite grasp how this data
transfer is conducted.

Any help would be appreciated,

Dhaivat Pandya

Re: Hadoop v1.8 data transfer protocol

Posted by Dhaivat Pandya <dh...@gmail.com>.
Hi Harsh,

I did mean 0.18 - sorry about the typo.

I read through the BlockSender.sendChunks method once again and noticed
that I wasn't reading the checksum byte array correctly in my code.

Thanks for the help,

Dhaivat Pandya



On Sun, Apr 6, 2014 at 8:59 PM, Harsh J <ha...@cloudera.com> wrote:

> There's been no Apache Hadoop release versioned v1.8 historically, nor
> is one upcoming. Do you mean 0.18?
>
> Either way, can you point to the specific code lines in BlockSender
> which have you confused? The sendBlock and sendPacket methods would
> interest you I assume, but they appear to be well constructed/named
> internally and commented in a few important spots.
>
> On Mon, Apr 7, 2014 at 6:39 AM, Dhaivat Pandya <dh...@gmail.com>
> wrote:
> > Hi,
> >
> > I'm trying to figure out how data is transferred between client and
> > DataNode in Hadoop v1.8.
> >
> > This is my understanding so far:
> >
> > The client first fires an OP_READ_BLOCK request. The DataNode responds
> with
> > a status code, checksum header, chunk offset, packet length, sequence
> > number, the last packet boolean, the length and the data (in that order).
> >
> > However, I'm running into an issue. First of all, which of these lengths
> > describes the length of the data? I tried both PacketLength and Length it
> > seems that they leave data on the stream (I tried to "cat" a file with
> the
> > numbers 1-1000 in it).
> >
> > Also, how does the DataNode signal the start of another packet? After
> > "Length" number of bytes have been read, I assumed that the header would
> be
> > repeated, but this is not the case (I'm not getting sane values for any
> of
> > the fields of the header).
> >
> > I've looked through the DataXceiver, BlockSender, DFSClient
> > (RemoteBlockReader) classes but I still can't quite grasp how this data
> > transfer is conducted.
> >
> > Any help would be appreciated,
> >
> > Dhaivat Pandya
>
>
>
> --
> Harsh J
>

Re: Hadoop v1.8 data transfer protocol

Posted by Harsh J <ha...@cloudera.com>.
There's been no Apache Hadoop release versioned v1.8 historically, nor
is one upcoming. Do you mean 0.18?

Either way, can you point to the specific code lines in BlockSender
which have you confused? The sendBlock and sendPacket methods would
interest you I assume, but they appear to be well constructed/named
internally and commented in a few important spots.

On Mon, Apr 7, 2014 at 6:39 AM, Dhaivat Pandya <dh...@gmail.com> wrote:
> Hi,
>
> I'm trying to figure out how data is transferred between client and
> DataNode in Hadoop v1.8.
>
> This is my understanding so far:
>
> The client first fires an OP_READ_BLOCK request. The DataNode responds with
> a status code, checksum header, chunk offset, packet length, sequence
> number, the last packet boolean, the length and the data (in that order).
>
> However, I'm running into an issue. First of all, which of these lengths
> describes the length of the data? I tried both PacketLength and Length it
> seems that they leave data on the stream (I tried to "cat" a file with the
> numbers 1-1000 in it).
>
> Also, how does the DataNode signal the start of another packet? After
> "Length" number of bytes have been read, I assumed that the header would be
> repeated, but this is not the case (I'm not getting sane values for any of
> the fields of the header).
>
> I've looked through the DataXceiver, BlockSender, DFSClient
> (RemoteBlockReader) classes but I still can't quite grasp how this data
> transfer is conducted.
>
> Any help would be appreciated,
>
> Dhaivat Pandya



-- 
Harsh J