You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2013/06/03 20:09:12 UTC

HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that 


HDFS provides interfaces for applications to move themselves closer to where the data is located. 


What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

 
Regards,
Mahmood

Re: HDFS interfaces

Posted by Jay Vyas <ja...@gmail.com>.

Looking in the source, it appears that In HDFS, the Namenode supports
getting this info directly via the client, and ultimately communicates
block locations to the DFSClient , which is used by the
DistributedFileSystem.

  /**
   * @see ClientProtocol#getBlockLocations(String, long, long)
   */
  static LocatedBlocks callGetBlockLocations(ClientProtocol namenode,
      String src, long start, long length)
      throws IOException {
    try {
      return namenode.getBlockLocations(src, start, length);
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class,
                                     FileNotFoundException.class,
                                     UnresolvedPathException.class);
    }
  }




On Tue, Jun 4, 2013 at 2:00 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> There are many instances of getFileBlockLocations in hadoop/fs. Can you
> explain which one is the main?
> >It must be combined with a method of logically splitting the input data
> along block boundaries, and of launching tasks on worker nodes that >are
> close to the data splits
> Is this a user level task of system level task?
>
>
> Regards,
> Mahmood*
> *
>
>   ------------------------------
>  *From:* John Lilley <jo...@redpoint.net>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <
> nt_mahmood@yahoo.com>
> *Sent:* Tuesday, June 4, 2013 3:28 AM
> *Subject:* RE: HDFS interfaces
>
>  Mahmood,
>
> It is the in the FileSystem interface.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,
> long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations%28org.apache.hadoop.fs.Path,%20long,%20long%29>
>
> This by itself is not sufficient for application programmers to make good
> use of data locality.  It must be combined with a method of logically
> splitting the input data along block boundaries, and of launching tasks on
> worker nodes that are close to the data splits.  MapReduce does both of
> these things internally along with the file-format input classes.  For an
> application to do so directly, see the new YARN-based interfaces
> ApplicationMaster and ResourceManager.  These are however very new and
> there is little documentation or examples.
>
> john
>
>  *From:* Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
> *Sent:* Monday, June 03, 2013 12:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* HDFS interfaces
>
>  Hello,
>  It is stated in the "HDFS architecture guide" (
> https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that
>
>  *HDFS provides interfaces for applications to move themselves closer to
> where the data is located. *
>
>  What are these interfaces and where they are in the source code? Is
> there any manual for the interfaces?
>
>   Regards,
> Mahmood
>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: HDFS interfaces

Posted by Jay Vyas <ja...@gmail.com>.

Looking in the source, it appears that In HDFS, the Namenode supports
getting this info directly via the client, and ultimately communicates
block locations to the DFSClient , which is used by the
DistributedFileSystem.

  /**
   * @see ClientProtocol#getBlockLocations(String, long, long)
   */
  static LocatedBlocks callGetBlockLocations(ClientProtocol namenode,
      String src, long start, long length)
      throws IOException {
    try {
      return namenode.getBlockLocations(src, start, length);
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class,
                                     FileNotFoundException.class,
                                     UnresolvedPathException.class);
    }
  }




On Tue, Jun 4, 2013 at 2:00 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> There are many instances of getFileBlockLocations in hadoop/fs. Can you
> explain which one is the main?
> >It must be combined with a method of logically splitting the input data
> along block boundaries, and of launching tasks on worker nodes that >are
> close to the data splits
> Is this a user level task of system level task?
>
>
> Regards,
> Mahmood*
> *
>
>   ------------------------------
>  *From:* John Lilley <jo...@redpoint.net>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <
> nt_mahmood@yahoo.com>
> *Sent:* Tuesday, June 4, 2013 3:28 AM
> *Subject:* RE: HDFS interfaces
>
>  Mahmood,
>
> It is the in the FileSystem interface.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,
> long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations%28org.apache.hadoop.fs.Path,%20long,%20long%29>
>
> This by itself is not sufficient for application programmers to make good
> use of data locality.  It must be combined with a method of logically
> splitting the input data along block boundaries, and of launching tasks on
> worker nodes that are close to the data splits.  MapReduce does both of
> these things internally along with the file-format input classes.  For an
> application to do so directly, see the new YARN-based interfaces
> ApplicationMaster and ResourceManager.  These are however very new and
> there is little documentation or examples.
>
> john
>
>  *From:* Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
> *Sent:* Monday, June 03, 2013 12:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* HDFS interfaces
>
>  Hello,
>  It is stated in the "HDFS architecture guide" (
> https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that
>
>  *HDFS provides interfaces for applications to move themselves closer to
> where the data is located. *
>
>  What are these interfaces and where they are in the source code? Is
> there any manual for the interfaces?
>
>   Regards,
> Mahmood
>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

When you use the HDFS client interface to read a file, it automatically figures out which datanodes to contact for reading which blocks.  There isn't really a "main" block.  However I have read that the first location listed for each block is the "recommended" one to read for an outside client.  Normally, an outside client doesn't need to know this information at all as the HDFS file interface takes care of it.  An "inside" application such as MapReduce *does* need to know this information so that it can run tasks on nodes that are "close" to the data split being processed.  If you are writing a custom ApplicationMaster using YARN, you will also want to know this.

John

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Tuesday, June 04, 2013 12:01 AM
To: user@hadoop.apache.org
Subject: Re: HDFS interfaces

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?
>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits
Is this a user level task of system level task?

Regards,
Mahmood

________________________________
From: John Lilley <jo...@redpoint.net>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>; Mahmood Naderan <nt...@yahoo.com>>
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

When you use the HDFS client interface to read a file, it automatically figures out which datanodes to contact for reading which blocks.  There isn't really a "main" block.  However I have read that the first location listed for each block is the "recommended" one to read for an outside client.  Normally, an outside client doesn't need to know this information at all as the HDFS file interface takes care of it.  An "inside" application such as MapReduce *does* need to know this information so that it can run tasks on nodes that are "close" to the data split being processed.  If you are writing a custom ApplicationMaster using YARN, you will also want to know this.

John

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Tuesday, June 04, 2013 12:01 AM
To: user@hadoop.apache.org
Subject: Re: HDFS interfaces

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?
>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits
Is this a user level task of system level task?

Regards,
Mahmood

________________________________
From: John Lilley <jo...@redpoint.net>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>; Mahmood Naderan <nt...@yahoo.com>>
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

When you use the HDFS client interface to read a file, it automatically figures out which datanodes to contact for reading which blocks.  There isn't really a "main" block.  However I have read that the first location listed for each block is the "recommended" one to read for an outside client.  Normally, an outside client doesn't need to know this information at all as the HDFS file interface takes care of it.  An "inside" application such as MapReduce *does* need to know this information so that it can run tasks on nodes that are "close" to the data split being processed.  If you are writing a custom ApplicationMaster using YARN, you will also want to know this.

John

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Tuesday, June 04, 2013 12:01 AM
To: user@hadoop.apache.org
Subject: Re: HDFS interfaces

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?
>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits
Is this a user level task of system level task?

Regards,
Mahmood

________________________________
From: John Lilley <jo...@redpoint.net>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>; Mahmood Naderan <nt...@yahoo.com>>
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

When you use the HDFS client interface to read a file, it automatically figures out which datanodes to contact for reading which blocks.  There isn't really a "main" block.  However I have read that the first location listed for each block is the "recommended" one to read for an outside client.  Normally, an outside client doesn't need to know this information at all as the HDFS file interface takes care of it.  An "inside" application such as MapReduce *does* need to know this information so that it can run tasks on nodes that are "close" to the data split being processed.  If you are writing a custom ApplicationMaster using YARN, you will also want to know this.

John

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Tuesday, June 04, 2013 12:01 AM
To: user@hadoop.apache.org
Subject: Re: HDFS interfaces

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?
>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits
Is this a user level task of system level task?

Regards,
Mahmood

________________________________
From: John Lilley <jo...@redpoint.net>>
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>; Mahmood Naderan <nt...@yahoo.com>>
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood

Re: HDFS interfaces

Posted by Jay Vyas <ja...@gmail.com>.

Looking in the source, it appears that In HDFS, the Namenode supports
getting this info directly via the client, and ultimately communicates
block locations to the DFSClient , which is used by the
DistributedFileSystem.

  /**
   * @see ClientProtocol#getBlockLocations(String, long, long)
   */
  static LocatedBlocks callGetBlockLocations(ClientProtocol namenode,
      String src, long start, long length)
      throws IOException {
    try {
      return namenode.getBlockLocations(src, start, length);
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class,
                                     FileNotFoundException.class,
                                     UnresolvedPathException.class);
    }
  }




On Tue, Jun 4, 2013 at 2:00 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> There are many instances of getFileBlockLocations in hadoop/fs. Can you
> explain which one is the main?
> >It must be combined with a method of logically splitting the input data
> along block boundaries, and of launching tasks on worker nodes that >are
> close to the data splits
> Is this a user level task of system level task?
>
>
> Regards,
> Mahmood*
> *
>
>   ------------------------------
>  *From:* John Lilley <jo...@redpoint.net>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <
> nt_mahmood@yahoo.com>
> *Sent:* Tuesday, June 4, 2013 3:28 AM
> *Subject:* RE: HDFS interfaces
>
>  Mahmood,
>
> It is the in the FileSystem interface.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,
> long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations%28org.apache.hadoop.fs.Path,%20long,%20long%29>
>
> This by itself is not sufficient for application programmers to make good
> use of data locality.  It must be combined with a method of logically
> splitting the input data along block boundaries, and of launching tasks on
> worker nodes that are close to the data splits.  MapReduce does both of
> these things internally along with the file-format input classes.  For an
> application to do so directly, see the new YARN-based interfaces
> ApplicationMaster and ResourceManager.  These are however very new and
> there is little documentation or examples.
>
> john
>
>  *From:* Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
> *Sent:* Monday, June 03, 2013 12:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* HDFS interfaces
>
>  Hello,
>  It is stated in the "HDFS architecture guide" (
> https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that
>
>  *HDFS provides interfaces for applications to move themselves closer to
> where the data is located. *
>
>  What are these interfaces and where they are in the source code? Is
> there any manual for the interfaces?
>
>   Regards,
> Mahmood
>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: HDFS interfaces

Posted by Jay Vyas <ja...@gmail.com>.

Looking in the source, it appears that In HDFS, the Namenode supports
getting this info directly via the client, and ultimately communicates
block locations to the DFSClient , which is used by the
DistributedFileSystem.

  /**
   * @see ClientProtocol#getBlockLocations(String, long, long)
   */
  static LocatedBlocks callGetBlockLocations(ClientProtocol namenode,
      String src, long start, long length)
      throws IOException {
    try {
      return namenode.getBlockLocations(src, start, length);
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class,
                                     FileNotFoundException.class,
                                     UnresolvedPathException.class);
    }
  }




On Tue, Jun 4, 2013 at 2:00 AM, Mahmood Naderan <nt...@yahoo.com>wrote:

> There are many instances of getFileBlockLocations in hadoop/fs. Can you
> explain which one is the main?
> >It must be combined with a method of logically splitting the input data
> along block boundaries, and of launching tasks on worker nodes that >are
> close to the data splits
> Is this a user level task of system level task?
>
>
> Regards,
> Mahmood*
> *
>
>   ------------------------------
>  *From:* John Lilley <jo...@redpoint.net>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <
> nt_mahmood@yahoo.com>
> *Sent:* Tuesday, June 4, 2013 3:28 AM
> *Subject:* RE: HDFS interfaces
>
>  Mahmood,
>
> It is the in the FileSystem interface.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,
> long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations%28org.apache.hadoop.fs.Path,%20long,%20long%29>
>
> This by itself is not sufficient for application programmers to make good
> use of data locality.  It must be combined with a method of logically
> splitting the input data along block boundaries, and of launching tasks on
> worker nodes that are close to the data splits.  MapReduce does both of
> these things internally along with the file-format input classes.  For an
> application to do so directly, see the new YARN-based interfaces
> ApplicationMaster and ResourceManager.  These are however very new and
> there is little documentation or examples.
>
> john
>
>  *From:* Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
> *Sent:* Monday, June 03, 2013 12:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* HDFS interfaces
>
>  Hello,
>  It is stated in the "HDFS architecture guide" (
> https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that
>
>  *HDFS provides interfaces for applications to move themselves closer to
> where the data is located. *
>
>  What are these interfaces and where they are in the source code? Is
> there any manual for the interfaces?
>
>   Regards,
> Mahmood
>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: HDFS interfaces

Posted by Mahmood Naderan <nt...@yahoo.com>.

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?

>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits 
Is this a user level task of system level task? 



Regards,
Mahmood



________________________________
 From: John Lilley <jo...@redpoint.net>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <nt...@yahoo.com> 
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces
 


 
Mahmood,
 
It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)
 
This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.
 
john
 
From:Mahmood Naderan [mailto:nt_mahmood@yahoo.com] 
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces
 
Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that 
 
HDFS provides interfaces for applications to move themselves closer to where the data is located. 
 
What are these interfaces and where they are in the source code? Is there any manual for the interfaces?
 
Regards,
Mahmood

Re: HDFS interfaces

Posted by Mahmood Naderan <nt...@yahoo.com>.

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?

>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits 
Is this a user level task of system level task? 



Regards,
Mahmood



________________________________
 From: John Lilley <jo...@redpoint.net>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <nt...@yahoo.com> 
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces
 


 
Mahmood,
 
It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)
 
This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.
 
john
 
From:Mahmood Naderan [mailto:nt_mahmood@yahoo.com] 
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces
 
Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that 
 
HDFS provides interfaces for applications to move themselves closer to where the data is located. 
 
What are these interfaces and where they are in the source code? Is there any manual for the interfaces?
 
Regards,
Mahmood

Re: HDFS interfaces

Posted by Mahmood Naderan <nt...@yahoo.com>.

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?

>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits 
Is this a user level task of system level task? 



Regards,
Mahmood



________________________________
 From: John Lilley <jo...@redpoint.net>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <nt...@yahoo.com> 
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces
 


 
Mahmood,
 
It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)
 
This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.
 
john
 
From:Mahmood Naderan [mailto:nt_mahmood@yahoo.com] 
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces
 
Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that 
 
HDFS provides interfaces for applications to move themselves closer to where the data is located. 
 
What are these interfaces and where they are in the source code? Is there any manual for the interfaces?
 
Regards,
Mahmood

Re: HDFS interfaces

Posted by Mahmood Naderan <nt...@yahoo.com>.

There are many instances of getFileBlockLocations in hadoop/fs. Can you explain which one is the main?

>It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that >are close to the data splits 
Is this a user level task of system level task? 



Regards,
Mahmood



________________________________
 From: John Lilley <jo...@redpoint.net>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; Mahmood Naderan <nt...@yahoo.com> 
Sent: Tuesday, June 4, 2013 3:28 AM
Subject: RE: HDFS interfaces
 


 
Mahmood,
 
It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)
 
This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.
 
john
 
From:Mahmood Naderan [mailto:nt_mahmood@yahoo.com] 
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces
 
Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that 
 
HDFS provides interfaces for applications to move themselves closer to where the data is located. 
 
What are these interfaces and where they are in the source code? Is there any manual for the interfaces?
 
Regards,
Mahmood

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood

RE: HDFS interfaces

Posted by John Lilley <jo...@redpoint.net>.

Mahmood,

It is the in the FileSystem interface.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path, long, long)<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)>

This by itself is not sufficient for application programmers to make good use of data locality.  It must be combined with a method of logically splitting the input data along block boundaries, and of launching tasks on worker nodes that are close to the data splits.  MapReduce does both of these things internally along with the file-format input classes.  For an application to do so directly, see the new YARN-based interfaces ApplicationMaster and ResourceManager.  These are however very new and there is little documentation or examples.

john

From: Mahmood Naderan [mailto:nt_mahmood@yahoo.com]
Sent: Monday, June 03, 2013 12:09 PM
To: user@hadoop.apache.org
Subject: HDFS interfaces

Hello,
It is stated in the "HDFS architecture guide" (https://hadoop.apache.org/docs/r1.0.4/hdfs_design.html) that

HDFS provides interfaces for applications to move themselves closer to where the data is located.

What are these interfaces and where they are in the source code? Is there any manual for the interfaces?

Regards,
Mahmood