You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Demai Ni <ni...@gmail.com> on 2014/08/26 00:42:55 UTC

Local file system to access hdfs blocks

Hi, folks,

New in this area. Hopefully to get a couple pointers. 

I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)

I am wondering whether there is a interface to get each hdfs block information in the term of local file system. 

For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]

 With such info, is there a way to 
1) login to hfds01, and read the block directly at local file system level?


Thanks

Demai on the run

Re: Tez and MapReduce

Posted by Bing Jiang <ji...@gmail.com>.

By the way, mapreduce.framework.name can be set yarn or yarn-tez. It will
make differences.


2014-09-02 8:24 GMT+08:00 jay vyas <ja...@gmail.com>:

> Yes as an example of running a mapreduce job followed by a tez you can see
> our last post on this
> https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .
> You can see in the bigtop/tez testing
> blogpost that you can confirm that Tez is being used easily on the web ui.
>
> From TezClent.java:
>
>
> /**
>  * TezClient is used to submit Tez DAGs for execution. DAG's are executed
> via a
>  * Tez App Master. TezClient can run the App Master in session or
> non-session
>  * mode. <br>
>  * In non-session mode, each DAG is executed in a different App Master that
>  * exits after the DAG execution completes. <br>
>  * In session mode, the TezClient creates a single instance of the App
> Master
>  * and all DAG's are submitted to the same App Master.<br>
>  * Session mode may give better performance when a series of DAGs need to
>  * executed because it enables resource re-use across those DAGs.
> Non-session
>  * mode should be used when the user wants to submit a single DAG or wants
> to
>  * disconnect from the cluster after submitting a set of unrelated DAGs.
> <br>
>  * If API recommendations are followed, then the choice of running in
> session or
>  * non-session mode is transparent to writing the application. By changing
> the
>  * session mode configuration, the same application can be running in
> session or
>  * non-session mode.
>  */
>
>
>
> On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> e.g. in hive to switch engines
>> set hive.execution.engine=mr;
>> or
>> set hive.execution.engine=tez;
>>
>> tez is faster especially on complex queries.
>> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
>> adaryl.wakefield@hotmail.com> wrote:
>>
>>>   Can Tez and MapReduce live together and get along in the same cluster?
>>> B.
>>>
>>
>
>
> --
> jay vyas
>

Re: Tez and MapReduce

Posted by Bing Jiang <ji...@gmail.com>.

By the way, mapreduce.framework.name can be set yarn or yarn-tez. It will
make differences.


2014-09-02 8:24 GMT+08:00 jay vyas <ja...@gmail.com>:

> Yes as an example of running a mapreduce job followed by a tez you can see
> our last post on this
> https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .
> You can see in the bigtop/tez testing
> blogpost that you can confirm that Tez is being used easily on the web ui.
>
> From TezClent.java:
>
>
> /**
>  * TezClient is used to submit Tez DAGs for execution. DAG's are executed
> via a
>  * Tez App Master. TezClient can run the App Master in session or
> non-session
>  * mode. <br>
>  * In non-session mode, each DAG is executed in a different App Master that
>  * exits after the DAG execution completes. <br>
>  * In session mode, the TezClient creates a single instance of the App
> Master
>  * and all DAG's are submitted to the same App Master.<br>
>  * Session mode may give better performance when a series of DAGs need to
>  * executed because it enables resource re-use across those DAGs.
> Non-session
>  * mode should be used when the user wants to submit a single DAG or wants
> to
>  * disconnect from the cluster after submitting a set of unrelated DAGs.
> <br>
>  * If API recommendations are followed, then the choice of running in
> session or
>  * non-session mode is transparent to writing the application. By changing
> the
>  * session mode configuration, the same application can be running in
> session or
>  * non-session mode.
>  */
>
>
>
> On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> e.g. in hive to switch engines
>> set hive.execution.engine=mr;
>> or
>> set hive.execution.engine=tez;
>>
>> tez is faster especially on complex queries.
>> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
>> adaryl.wakefield@hotmail.com> wrote:
>>
>>>   Can Tez and MapReduce live together and get along in the same cluster?
>>> B.
>>>
>>
>
>
> --
> jay vyas
>

Re: Tez and MapReduce

Posted by Bing Jiang <ji...@gmail.com>.

By the way, mapreduce.framework.name can be set yarn or yarn-tez. It will
make differences.


2014-09-02 8:24 GMT+08:00 jay vyas <ja...@gmail.com>:

> Yes as an example of running a mapreduce job followed by a tez you can see
> our last post on this
> https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .
> You can see in the bigtop/tez testing
> blogpost that you can confirm that Tez is being used easily on the web ui.
>
> From TezClent.java:
>
>
> /**
>  * TezClient is used to submit Tez DAGs for execution. DAG's are executed
> via a
>  * Tez App Master. TezClient can run the App Master in session or
> non-session
>  * mode. <br>
>  * In non-session mode, each DAG is executed in a different App Master that
>  * exits after the DAG execution completes. <br>
>  * In session mode, the TezClient creates a single instance of the App
> Master
>  * and all DAG's are submitted to the same App Master.<br>
>  * Session mode may give better performance when a series of DAGs need to
>  * executed because it enables resource re-use across those DAGs.
> Non-session
>  * mode should be used when the user wants to submit a single DAG or wants
> to
>  * disconnect from the cluster after submitting a set of unrelated DAGs.
> <br>
>  * If API recommendations are followed, then the choice of running in
> session or
>  * non-session mode is transparent to writing the application. By changing
> the
>  * session mode configuration, the same application can be running in
> session or
>  * non-session mode.
>  */
>
>
>
> On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> e.g. in hive to switch engines
>> set hive.execution.engine=mr;
>> or
>> set hive.execution.engine=tez;
>>
>> tez is faster especially on complex queries.
>> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
>> adaryl.wakefield@hotmail.com> wrote:
>>
>>>   Can Tez and MapReduce live together and get along in the same cluster?
>>> B.
>>>
>>
>
>
> --
> jay vyas
>

Re: Tez and MapReduce

Posted by Bing Jiang <ji...@gmail.com>.

By the way, mapreduce.framework.name can be set yarn or yarn-tez. It will
make differences.


2014-09-02 8:24 GMT+08:00 jay vyas <ja...@gmail.com>:

> Yes as an example of running a mapreduce job followed by a tez you can see
> our last post on this
> https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .
> You can see in the bigtop/tez testing
> blogpost that you can confirm that Tez is being used easily on the web ui.
>
> From TezClent.java:
>
>
> /**
>  * TezClient is used to submit Tez DAGs for execution. DAG's are executed
> via a
>  * Tez App Master. TezClient can run the App Master in session or
> non-session
>  * mode. <br>
>  * In non-session mode, each DAG is executed in a different App Master that
>  * exits after the DAG execution completes. <br>
>  * In session mode, the TezClient creates a single instance of the App
> Master
>  * and all DAG's are submitted to the same App Master.<br>
>  * Session mode may give better performance when a series of DAGs need to
>  * executed because it enables resource re-use across those DAGs.
> Non-session
>  * mode should be used when the user wants to submit a single DAG or wants
> to
>  * disconnect from the cluster after submitting a set of unrelated DAGs.
> <br>
>  * If API recommendations are followed, then the choice of running in
> session or
>  * non-session mode is transparent to writing the application. By changing
> the
>  * session mode configuration, the same application can be running in
> session or
>  * non-session mode.
>  */
>
>
>
> On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <apivovarov@gmail.com
> > wrote:
>
>> e.g. in hive to switch engines
>> set hive.execution.engine=mr;
>> or
>> set hive.execution.engine=tez;
>>
>> tez is faster especially on complex queries.
>> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
>> adaryl.wakefield@hotmail.com> wrote:
>>
>>>   Can Tez and MapReduce live together and get along in the same cluster?
>>> B.
>>>
>>
>
>
> --
> jay vyas
>

Re: Tez and MapReduce

Posted by jay vyas <ja...@gmail.com>.

Yes as an example of running a mapreduce job followed by a tez you can see
our last post on this
https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .  You
can see in the bigtop/tez testing
blogpost that you can confirm that Tez is being used easily on the web ui.

>From TezClent.java:

/**
 * TezClient is used to submit Tez DAGs for execution. DAG's are executed
via a
 * Tez App Master. TezClient can run the App Master in session or
non-session
 * mode. <br>
 * In non-session mode, each DAG is executed in a different App Master that
 * exits after the DAG execution completes. <br>
 * In session mode, the TezClient creates a single instance of the App
Master
 * and all DAG's are submitted to the same App Master.<br>
 * Session mode may give better performance when a series of DAGs need to
 * executed because it enables resource re-use across those DAGs.
Non-session
 * mode should be used when the user wants to submit a single DAG or wants
to
 * disconnect from the cluster after submitting a set of unrelated DAGs.
<br>
 * If API recommendations are followed, then the choice of running in
session or
 * non-session mode is transparent to writing the application. By changing
the
 * session mode configuration, the same application can be running in
session or
 * non-session mode.
 */

On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> e.g. in hive to switch engines
> set hive.execution.engine=mr;
> or
> set hive.execution.engine=tez;
>
> tez is faster especially on complex queries.
> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   Can Tez and MapReduce live together and get along in the same cluster?
>> B.
>>
>

-- 
jay vyas

Re: Tez and MapReduce

Posted by jay vyas <ja...@gmail.com>.

Yes as an example of running a mapreduce job followed by a tez you can see
our last post on this
https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .  You
can see in the bigtop/tez testing
blogpost that you can confirm that Tez is being used easily on the web ui.

>From TezClent.java:

/**
 * TezClient is used to submit Tez DAGs for execution. DAG's are executed
via a
 * Tez App Master. TezClient can run the App Master in session or
non-session
 * mode. <br>
 * In non-session mode, each DAG is executed in a different App Master that
 * exits after the DAG execution completes. <br>
 * In session mode, the TezClient creates a single instance of the App
Master
 * and all DAG's are submitted to the same App Master.<br>
 * Session mode may give better performance when a series of DAGs need to
 * executed because it enables resource re-use across those DAGs.
Non-session
 * mode should be used when the user wants to submit a single DAG or wants
to
 * disconnect from the cluster after submitting a set of unrelated DAGs.
<br>
 * If API recommendations are followed, then the choice of running in
session or
 * non-session mode is transparent to writing the application. By changing
the
 * session mode configuration, the same application can be running in
session or
 * non-session mode.
 */

On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> e.g. in hive to switch engines
> set hive.execution.engine=mr;
> or
> set hive.execution.engine=tez;
>
> tez is faster especially on complex queries.
> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   Can Tez and MapReduce live together and get along in the same cluster?
>> B.
>>
>

-- 
jay vyas

Re: Tez and MapReduce

Posted by jay vyas <ja...@gmail.com>.

Yes as an example of running a mapreduce job followed by a tez you can see
our last post on this
https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .  You
can see in the bigtop/tez testing
blogpost that you can confirm that Tez is being used easily on the web ui.

>From TezClent.java:

/**
 * TezClient is used to submit Tez DAGs for execution. DAG's are executed
via a
 * Tez App Master. TezClient can run the App Master in session or
non-session
 * mode. <br>
 * In non-session mode, each DAG is executed in a different App Master that
 * exits after the DAG execution completes. <br>
 * In session mode, the TezClient creates a single instance of the App
Master
 * and all DAG's are submitted to the same App Master.<br>
 * Session mode may give better performance when a series of DAGs need to
 * executed because it enables resource re-use across those DAGs.
Non-session
 * mode should be used when the user wants to submit a single DAG or wants
to
 * disconnect from the cluster after submitting a set of unrelated DAGs.
<br>
 * If API recommendations are followed, then the choice of running in
session or
 * non-session mode is transparent to writing the application. By changing
the
 * session mode configuration, the same application can be running in
session or
 * non-session mode.
 */

On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> e.g. in hive to switch engines
> set hive.execution.engine=mr;
> or
> set hive.execution.engine=tez;
>
> tez is faster especially on complex queries.
> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   Can Tez and MapReduce live together and get along in the same cluster?
>> B.
>>
>

-- 
jay vyas

Re: Tez and MapReduce

Posted by jay vyas <ja...@gmail.com>.

Yes as an example of running a mapreduce job followed by a tez you can see
our last post on this
https://blogs.apache.org/bigtop/entry/testing_apache_tez_with_apache .  You
can see in the bigtop/tez testing
blogpost that you can confirm that Tez is being used easily on the web ui.

>From TezClent.java:

/**
 * TezClient is used to submit Tez DAGs for execution. DAG's are executed
via a
 * Tez App Master. TezClient can run the App Master in session or
non-session
 * mode. <br>
 * In non-session mode, each DAG is executed in a different App Master that
 * exits after the DAG execution completes. <br>
 * In session mode, the TezClient creates a single instance of the App
Master
 * and all DAG's are submitted to the same App Master.<br>
 * Session mode may give better performance when a series of DAGs need to
 * executed because it enables resource re-use across those DAGs.
Non-session
 * mode should be used when the user wants to submit a single DAG or wants
to
 * disconnect from the cluster after submitting a set of unrelated DAGs.
<br>
 * If API recommendations are followed, then the choice of running in
session or
 * non-session mode is transparent to writing the application. By changing
the
 * session mode configuration, the same application can be running in
session or
 * non-session mode.
 */

On Mon, Sep 1, 2014 at 12:43 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> e.g. in hive to switch engines
> set hive.execution.engine=mr;
> or
> set hive.execution.engine=tez;
>
> tez is faster especially on complex queries.
> On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
> adaryl.wakefield@hotmail.com> wrote:
>
>>   Can Tez and MapReduce live together and get along in the same cluster?
>> B.
>>
>

-- 
jay vyas

Re: Tez and MapReduce

Posted by Alexander Pivovarov <ap...@gmail.com>.

e.g. in hive to switch engines
set hive.execution.engine=mr;
or
set hive.execution.engine=tez;

tez is faster especially on complex queries.
On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
adaryl.wakefield@hotmail.com> wrote:

>   Can Tez and MapReduce live together and get along in the same cluster?
> B.
>

Re: Tez and MapReduce

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.

Hi,

Yes, they can on YARN.

- Tsuyoshi

On Mon, Sep 1, 2014 at 2:32 PM, Adaryl "Bob" Wakefield, MBA
<ad...@hotmail.com> wrote:
> Can Tez and MapReduce live together and get along in the same cluster?
> B.



-- 
- Tsuyoshi

Re: Tez and MapReduce

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.

Hi,

Yes, they can on YARN.

- Tsuyoshi

On Mon, Sep 1, 2014 at 2:32 PM, Adaryl "Bob" Wakefield, MBA
<ad...@hotmail.com> wrote:
> Can Tez and MapReduce live together and get along in the same cluster?
> B.



-- 
- Tsuyoshi

Re: Tez and MapReduce

Posted by Alexander Pivovarov <ap...@gmail.com>.

e.g. in hive to switch engines
set hive.execution.engine=mr;
or
set hive.execution.engine=tez;

tez is faster especially on complex queries.
On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
adaryl.wakefield@hotmail.com> wrote:

>   Can Tez and MapReduce live together and get along in the same cluster?
> B.
>

Re: Tez and MapReduce

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.

Hi,

Yes, they can on YARN.

- Tsuyoshi

On Mon, Sep 1, 2014 at 2:32 PM, Adaryl "Bob" Wakefield, MBA
<ad...@hotmail.com> wrote:
> Can Tez and MapReduce live together and get along in the same cluster?
> B.



-- 
- Tsuyoshi

Re: Tez and MapReduce

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.

Hi,

Yes, they can on YARN.

- Tsuyoshi

On Mon, Sep 1, 2014 at 2:32 PM, Adaryl "Bob" Wakefield, MBA
<ad...@hotmail.com> wrote:
> Can Tez and MapReduce live together and get along in the same cluster?
> B.



-- 
- Tsuyoshi

Re: Tez and MapReduce

Posted by Alexander Pivovarov <ap...@gmail.com>.

e.g. in hive to switch engines
set hive.execution.engine=mr;
or
set hive.execution.engine=tez;

tez is faster especially on complex queries.
On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
adaryl.wakefield@hotmail.com> wrote:

>   Can Tez and MapReduce live together and get along in the same cluster?
> B.
>

Re: Tez and MapReduce

Posted by Alexander Pivovarov <ap...@gmail.com>.

e.g. in hive to switch engines
set hive.execution.engine=mr;
or
set hive.execution.engine=tez;

tez is faster especially on complex queries.
On Aug 31, 2014 10:33 PM, "Adaryl "Bob" Wakefield, MBA" <
adaryl.wakefield@hotmail.com> wrote:

>   Can Tez and MapReduce live together and get along in the same cluster?
> B.
>

Tez and MapReduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Can Tez and MapReduce live together and get along in the same cluster?
B.

Tez and MapReduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Can Tez and MapReduce live together and get along in the same cluster?
B.

Tez and MapReduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Can Tez and MapReduce live together and get along in the same cluster?
B.

Tez and MapReduce

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.

Can Tez and MapReduce live together and get along in the same cluster?
B.

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley, 

Thanks. 

Btw, I found this jira hdfs-2246, which probably match what I am looking for.  

Demai on the run

On Aug 28, 2014, at 11:34 PM, Stanley Shi <ss...@pivotal.io> wrote:

> BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information
> blk_1073742025 is the block name;
> 
> these names are "private" to teh HDFS system and user should not use them, right?
> But if you really want ot know this, you can check the fsck code to see whether they are available;
> 
> 
> On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:
>> Stanley and all,
>> 
>> thanks. I will write a client application to explore this path. A quick question again. 
>> Using the fsck command, I can retrieve all the necessary info
>> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
>> .....
>>  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2
>> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>> 
>> However, using getFileBlockLocations(), I can't get the block name/id info, such as  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025
>> seem the BlockLocation don't provide the public info here. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>> 
>> is there another entry point? somethinig fsck is using? thanks 
>> 
>> Demai
>> 
>> 
>> 
>> 
>> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>> As far as I know, there's no combination of hadoop API can do that.
>>> You can easily get the location of the block (on which DN), but there's no way to get the local address of that block file.
>>> 
>>> 
>>> 
>>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>>> Yehia,
>>>> 
>>>> No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps, and the first step will be either hadoop fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using the default from CDH, which is /dfs/dn
>>>> 
>>>> I would like to it programmatically, so wondering whether someone already done it? or maybe better a hadoop API call already implemented for this exact purpose
>>>> 
>>>> Demai
>>>> 
>>>> 
>>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com> wrote:
>>>>> Hi Demai,
>>>>> 
>>>>> Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir, dfs.datanode.data.dir configuration keys).
>>>>> 
>>>>> Thanks
>>>>> Yehia 
>>>>> 
>>>>> 
>>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>>> Hi Demai,
>>>>>> 
>>>>>> You can use fsck utility like the following:
>>>>>> 
>>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>> 
>>>>>> This will display all the information you need about the blocks of your file.
>>>>>> 
>>>>>> Hope it helps.
>>>>>> Yehia
>>>>>> 
>>>>>> 
>>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>>> Hi, Stanley,
>>>>>>> 
>>>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>>> 2) use local file system call(like find command) to match the block to files on local file system .
>>>>>>> 
>>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>> 
>>>>>>> Demai on the run
>>>>>>> 
>>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>> 
>>>>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>>>> 
>>>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>>>> Hi, folks,
>>>>>>>>> 
>>>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>>> 
>>>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>>> 
>>>>>>>>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>>>>>>>>> 
>>>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>>>>>>>>> 
>>>>>>>>>  With such info, is there a way to
>>>>>>>>> 1) login to hfds01, and read the block directly at local file system level?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> Demai on the run
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Regards,
>>>>>>>> Stanley Shi,
>>>>>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Stanley Shi,
>>> 
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley, 

Thanks. 

Btw, I found this jira hdfs-2246, which probably match what I am looking for.  

Demai on the run

On Aug 28, 2014, at 11:34 PM, Stanley Shi <ss...@pivotal.io> wrote:

> BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information
> blk_1073742025 is the block name;
> 
> these names are "private" to teh HDFS system and user should not use them, right?
> But if you really want ot know this, you can check the fsck code to see whether they are available;
> 
> 
> On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:
>> Stanley and all,
>> 
>> thanks. I will write a client application to explore this path. A quick question again. 
>> Using the fsck command, I can retrieve all the necessary info
>> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
>> .....
>>  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2
>> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>> 
>> However, using getFileBlockLocations(), I can't get the block name/id info, such as  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025
>> seem the BlockLocation don't provide the public info here. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>> 
>> is there another entry point? somethinig fsck is using? thanks 
>> 
>> Demai
>> 
>> 
>> 
>> 
>> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>> As far as I know, there's no combination of hadoop API can do that.
>>> You can easily get the location of the block (on which DN), but there's no way to get the local address of that block file.
>>> 
>>> 
>>> 
>>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>>> Yehia,
>>>> 
>>>> No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps, and the first step will be either hadoop fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using the default from CDH, which is /dfs/dn
>>>> 
>>>> I would like to it programmatically, so wondering whether someone already done it? or maybe better a hadoop API call already implemented for this exact purpose
>>>> 
>>>> Demai
>>>> 
>>>> 
>>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com> wrote:
>>>>> Hi Demai,
>>>>> 
>>>>> Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir, dfs.datanode.data.dir configuration keys).
>>>>> 
>>>>> Thanks
>>>>> Yehia 
>>>>> 
>>>>> 
>>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>>> Hi Demai,
>>>>>> 
>>>>>> You can use fsck utility like the following:
>>>>>> 
>>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>> 
>>>>>> This will display all the information you need about the blocks of your file.
>>>>>> 
>>>>>> Hope it helps.
>>>>>> Yehia
>>>>>> 
>>>>>> 
>>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>>> Hi, Stanley,
>>>>>>> 
>>>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>>> 2) use local file system call(like find command) to match the block to files on local file system .
>>>>>>> 
>>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>> 
>>>>>>> Demai on the run
>>>>>>> 
>>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>> 
>>>>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>>>> 
>>>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>>>> Hi, folks,
>>>>>>>>> 
>>>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>>> 
>>>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>>> 
>>>>>>>>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>>>>>>>>> 
>>>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>>>>>>>>> 
>>>>>>>>>  With such info, is there a way to
>>>>>>>>> 1) login to hfds01, and read the block directly at local file system level?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> Demai on the run
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Regards,
>>>>>>>> Stanley Shi,
>>>>>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Stanley Shi,
>>> 
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley, 

Thanks. 

Btw, I found this jira hdfs-2246, which probably match what I am looking for.  

Demai on the run

On Aug 28, 2014, at 11:34 PM, Stanley Shi <ss...@pivotal.io> wrote:

> BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information
> blk_1073742025 is the block name;
> 
> these names are "private" to teh HDFS system and user should not use them, right?
> But if you really want ot know this, you can check the fsck code to see whether they are available;
> 
> 
> On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:
>> Stanley and all,
>> 
>> thanks. I will write a client application to explore this path. A quick question again. 
>> Using the fsck command, I can retrieve all the necessary info
>> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
>> .....
>>  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2
>> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>> 
>> However, using getFileBlockLocations(), I can't get the block name/id info, such as  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025
>> seem the BlockLocation don't provide the public info here. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>> 
>> is there another entry point? somethinig fsck is using? thanks 
>> 
>> Demai
>> 
>> 
>> 
>> 
>> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>> As far as I know, there's no combination of hadoop API can do that.
>>> You can easily get the location of the block (on which DN), but there's no way to get the local address of that block file.
>>> 
>>> 
>>> 
>>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>>> Yehia,
>>>> 
>>>> No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps, and the first step will be either hadoop fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using the default from CDH, which is /dfs/dn
>>>> 
>>>> I would like to it programmatically, so wondering whether someone already done it? or maybe better a hadoop API call already implemented for this exact purpose
>>>> 
>>>> Demai
>>>> 
>>>> 
>>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com> wrote:
>>>>> Hi Demai,
>>>>> 
>>>>> Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir, dfs.datanode.data.dir configuration keys).
>>>>> 
>>>>> Thanks
>>>>> Yehia 
>>>>> 
>>>>> 
>>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>>> Hi Demai,
>>>>>> 
>>>>>> You can use fsck utility like the following:
>>>>>> 
>>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>> 
>>>>>> This will display all the information you need about the blocks of your file.
>>>>>> 
>>>>>> Hope it helps.
>>>>>> Yehia
>>>>>> 
>>>>>> 
>>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>>> Hi, Stanley,
>>>>>>> 
>>>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>>> 2) use local file system call(like find command) to match the block to files on local file system .
>>>>>>> 
>>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>> 
>>>>>>> Demai on the run
>>>>>>> 
>>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>> 
>>>>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>>>> 
>>>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>>>> Hi, folks,
>>>>>>>>> 
>>>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>>> 
>>>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>>> 
>>>>>>>>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>>>>>>>>> 
>>>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>>>>>>>>> 
>>>>>>>>>  With such info, is there a way to
>>>>>>>>> 1) login to hfds01, and read the block directly at local file system level?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> Demai on the run
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Regards,
>>>>>>>> Stanley Shi,
>>>>>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Stanley Shi,
>>> 
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley, 

Thanks. 

Btw, I found this jira hdfs-2246, which probably match what I am looking for.  

Demai on the run

On Aug 28, 2014, at 11:34 PM, Stanley Shi <ss...@pivotal.io> wrote:

> BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information
> blk_1073742025 is the block name;
> 
> these names are "private" to teh HDFS system and user should not use them, right?
> But if you really want ot know this, you can check the fsck code to see whether they are available;
> 
> 
> On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:
>> Stanley and all,
>> 
>> thanks. I will write a client application to explore this path. A quick question again. 
>> Using the fsck command, I can retrieve all the necessary info
>> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
>> .....
>>  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2
>> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>> 
>> However, using getFileBlockLocations(), I can't get the block name/id info, such as  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025
>> seem the BlockLocation don't provide the public info here. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>> 
>> is there another entry point? somethinig fsck is using? thanks 
>> 
>> Demai
>> 
>> 
>> 
>> 
>> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>> As far as I know, there's no combination of hadoop API can do that.
>>> You can easily get the location of the block (on which DN), but there's no way to get the local address of that block file.
>>> 
>>> 
>>> 
>>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>>> Yehia,
>>>> 
>>>> No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps, and the first step will be either hadoop fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using the default from CDH, which is /dfs/dn
>>>> 
>>>> I would like to it programmatically, so wondering whether someone already done it? or maybe better a hadoop API call already implemented for this exact purpose
>>>> 
>>>> Demai
>>>> 
>>>> 
>>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com> wrote:
>>>>> Hi Demai,
>>>>> 
>>>>> Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir, dfs.datanode.data.dir configuration keys).
>>>>> 
>>>>> Thanks
>>>>> Yehia 
>>>>> 
>>>>> 
>>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>>> Hi Demai,
>>>>>> 
>>>>>> You can use fsck utility like the following:
>>>>>> 
>>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>> 
>>>>>> This will display all the information you need about the blocks of your file.
>>>>>> 
>>>>>> Hope it helps.
>>>>>> Yehia
>>>>>> 
>>>>>> 
>>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>>> Hi, Stanley,
>>>>>>> 
>>>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>>> 2) use local file system call(like find command) to match the block to files on local file system .
>>>>>>> 
>>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>> 
>>>>>>> Demai on the run
>>>>>>> 
>>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>> 
>>>>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>>>> 
>>>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>>>> Hi, folks,
>>>>>>>>> 
>>>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>>> 
>>>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>>> 
>>>>>>>>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>>>>>>>>> 
>>>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>>>>>>>>> 
>>>>>>>>>  With such info, is there a way to
>>>>>>>>> 1) login to hfds01, and read the block directly at local file system level?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> Demai on the run
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Regards,
>>>>>>>> Stanley Shi,
>>>>>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Stanley Shi,
>>> 
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

*BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information*
*blk_1073742025 <1073742025> is the block name;*

*these names are "private" to teh HDFS system and user should not use them,
right?*
*But if you really want ot know this, you can check the fsck code to see
whether they are available;*


On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:

> Stanley and all,
>
> thanks. I will write a client application to explore this path. A quick
> question again.
> Using the fsck command, I can retrieve all the necessary info
> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
> .....
>  *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025>*
> len=8 repl=2
> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>
> However, using getFileBlockLocations(), I can't get the block name/id
> info, such as
> *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025> *seem
> the BlockLocation don't provide the public info here.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>
> is there another entry point? somethinig fsck is using? thanks
>
> Demai
>
>
>
>
> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
>> As far as I know, there's no combination of hadoop API can do that.
>> You can easily get the location of the block (on which DN), but there's
>> no way to get the local address of that block file.
>>
>>
>>
>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Yehia,
>>>
>>> No problem at all. I really appreciate your willingness to help. Yeah.
>>> now I am able to get such information through two steps, and the first step
>>> will be either hadoop fsck or getFileBlockLocations(). and then search
>>> the local filesystem, my cluster is using the default from CDH, which is
>>> /dfs/dn
>>>
>>> I would like to it programmatically, so wondering whether someone
>>> already done it? or maybe better a hadoop API call already implemented for
>>> this exact purpose
>>>
>>> Demai
>>>
>>>
>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>>> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> Sorry, I missed that you are already tried this out. I think you can
>>>> construct the block location on the local file system if you have the block
>>>> pool id and the block id. If you are using cloudera distribution, the
>>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>>> dfs.datanode.data.dir configuration keys).
>>>>
>>>> Thanks
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>
>>>>> Hi Demai,
>>>>>
>>>>> You can use fsck utility like the following:
>>>>>
>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>
>>>>> This will display all the information you need about the blocks of
>>>>> your file.
>>>>>
>>>>> Hope it helps.
>>>>> Yehia
>>>>>
>>>>>
>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Stanley,
>>>>>>
>>>>>> Many thanks. Your method works. For now, I can have two steps
>>>>>> approach:
>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>> 2) use local file system call(like find command) to match the block
>>>>>> to files on local file system .
>>>>>>
>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>
>>>>>> Demai on the run
>>>>>>
>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>
>>>>>> I am not sure this is what you want but you can try this shell
>>>>>> command:
>>>>>>
>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, folks,
>>>>>>>
>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>
>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>
>>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>>> information in the term of local file system.
>>>>>>>
>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>>
>>>>>>>  With such info, is there a way to
>>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>>> level?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Demai on the run
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> *Stanley Shi,*
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

*BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information*
*blk_1073742025 <1073742025> is the block name;*

*these names are "private" to teh HDFS system and user should not use them,
right?*
*But if you really want ot know this, you can check the fsck code to see
whether they are available;*


On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:

> Stanley and all,
>
> thanks. I will write a client application to explore this path. A quick
> question again.
> Using the fsck command, I can retrieve all the necessary info
> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
> .....
>  *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025>*
> len=8 repl=2
> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>
> However, using getFileBlockLocations(), I can't get the block name/id
> info, such as
> *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025> *seem
> the BlockLocation don't provide the public info here.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>
> is there another entry point? somethinig fsck is using? thanks
>
> Demai
>
>
>
>
> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
>> As far as I know, there's no combination of hadoop API can do that.
>> You can easily get the location of the block (on which DN), but there's
>> no way to get the local address of that block file.
>>
>>
>>
>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Yehia,
>>>
>>> No problem at all. I really appreciate your willingness to help. Yeah.
>>> now I am able to get such information through two steps, and the first step
>>> will be either hadoop fsck or getFileBlockLocations(). and then search
>>> the local filesystem, my cluster is using the default from CDH, which is
>>> /dfs/dn
>>>
>>> I would like to it programmatically, so wondering whether someone
>>> already done it? or maybe better a hadoop API call already implemented for
>>> this exact purpose
>>>
>>> Demai
>>>
>>>
>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>>> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> Sorry, I missed that you are already tried this out. I think you can
>>>> construct the block location on the local file system if you have the block
>>>> pool id and the block id. If you are using cloudera distribution, the
>>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>>> dfs.datanode.data.dir configuration keys).
>>>>
>>>> Thanks
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>
>>>>> Hi Demai,
>>>>>
>>>>> You can use fsck utility like the following:
>>>>>
>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>
>>>>> This will display all the information you need about the blocks of
>>>>> your file.
>>>>>
>>>>> Hope it helps.
>>>>> Yehia
>>>>>
>>>>>
>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Stanley,
>>>>>>
>>>>>> Many thanks. Your method works. For now, I can have two steps
>>>>>> approach:
>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>> 2) use local file system call(like find command) to match the block
>>>>>> to files on local file system .
>>>>>>
>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>
>>>>>> Demai on the run
>>>>>>
>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>
>>>>>> I am not sure this is what you want but you can try this shell
>>>>>> command:
>>>>>>
>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, folks,
>>>>>>>
>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>
>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>
>>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>>> information in the term of local file system.
>>>>>>>
>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>>
>>>>>>>  With such info, is there a way to
>>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>>> level?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Demai on the run
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> *Stanley Shi,*
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

*BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information*
*blk_1073742025 <1073742025> is the block name;*

*these names are "private" to teh HDFS system and user should not use them,
right?*
*But if you really want ot know this, you can check the fsck code to see
whether they are available;*


On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:

> Stanley and all,
>
> thanks. I will write a client application to explore this path. A quick
> question again.
> Using the fsck command, I can retrieve all the necessary info
> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
> .....
>  *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025>*
> len=8 repl=2
> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>
> However, using getFileBlockLocations(), I can't get the block name/id
> info, such as
> *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025> *seem
> the BlockLocation don't provide the public info here.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>
> is there another entry point? somethinig fsck is using? thanks
>
> Demai
>
>
>
>
> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
>> As far as I know, there's no combination of hadoop API can do that.
>> You can easily get the location of the block (on which DN), but there's
>> no way to get the local address of that block file.
>>
>>
>>
>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Yehia,
>>>
>>> No problem at all. I really appreciate your willingness to help. Yeah.
>>> now I am able to get such information through two steps, and the first step
>>> will be either hadoop fsck or getFileBlockLocations(). and then search
>>> the local filesystem, my cluster is using the default from CDH, which is
>>> /dfs/dn
>>>
>>> I would like to it programmatically, so wondering whether someone
>>> already done it? or maybe better a hadoop API call already implemented for
>>> this exact purpose
>>>
>>> Demai
>>>
>>>
>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>>> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> Sorry, I missed that you are already tried this out. I think you can
>>>> construct the block location on the local file system if you have the block
>>>> pool id and the block id. If you are using cloudera distribution, the
>>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>>> dfs.datanode.data.dir configuration keys).
>>>>
>>>> Thanks
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>
>>>>> Hi Demai,
>>>>>
>>>>> You can use fsck utility like the following:
>>>>>
>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>
>>>>> This will display all the information you need about the blocks of
>>>>> your file.
>>>>>
>>>>> Hope it helps.
>>>>> Yehia
>>>>>
>>>>>
>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Stanley,
>>>>>>
>>>>>> Many thanks. Your method works. For now, I can have two steps
>>>>>> approach:
>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>> 2) use local file system call(like find command) to match the block
>>>>>> to files on local file system .
>>>>>>
>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>
>>>>>> Demai on the run
>>>>>>
>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>
>>>>>> I am not sure this is what you want but you can try this shell
>>>>>> command:
>>>>>>
>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, folks,
>>>>>>>
>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>
>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>
>>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>>> information in the term of local file system.
>>>>>>>
>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>>
>>>>>>>  With such info, is there a way to
>>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>>> level?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Demai on the run
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> *Stanley Shi,*
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

*BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information*
*blk_1073742025 <1073742025> is the block name;*

*these names are "private" to teh HDFS system and user should not use them,
right?*
*But if you really want ot know this, you can check the fsck code to see
whether they are available;*


On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <ni...@gmail.com> wrote:

> Stanley and all,
>
> thanks. I will write a client application to explore this path. A quick
> question again.
> Using the fsck command, I can retrieve all the necessary info
> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
> .....
>  *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025>*
> len=8 repl=2
> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>
> However, using getFileBlockLocations(), I can't get the block name/id
> info, such as
> *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025> *seem
> the BlockLocation don't provide the public info here.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>
> is there another entry point? somethinig fsck is using? thanks
>
> Demai
>
>
>
>
> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
>> As far as I know, there's no combination of hadoop API can do that.
>> You can easily get the location of the block (on which DN), but there's
>> no way to get the local address of that block file.
>>
>>
>>
>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Yehia,
>>>
>>> No problem at all. I really appreciate your willingness to help. Yeah.
>>> now I am able to get such information through two steps, and the first step
>>> will be either hadoop fsck or getFileBlockLocations(). and then search
>>> the local filesystem, my cluster is using the default from CDH, which is
>>> /dfs/dn
>>>
>>> I would like to it programmatically, so wondering whether someone
>>> already done it? or maybe better a hadoop API call already implemented for
>>> this exact purpose
>>>
>>> Demai
>>>
>>>
>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>>> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> Sorry, I missed that you are already tried this out. I think you can
>>>> construct the block location on the local file system if you have the block
>>>> pool id and the block id. If you are using cloudera distribution, the
>>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>>> dfs.datanode.data.dir configuration keys).
>>>>
>>>> Thanks
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>>
>>>>> Hi Demai,
>>>>>
>>>>> You can use fsck utility like the following:
>>>>>
>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>
>>>>> This will display all the information you need about the blocks of
>>>>> your file.
>>>>>
>>>>> Hope it helps.
>>>>> Yehia
>>>>>
>>>>>
>>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Stanley,
>>>>>>
>>>>>> Many thanks. Your method works. For now, I can have two steps
>>>>>> approach:
>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>> 2) use local file system call(like find command) to match the block
>>>>>> to files on local file system .
>>>>>>
>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>
>>>>>> Demai on the run
>>>>>>
>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>>
>>>>>> I am not sure this is what you want but you can try this shell
>>>>>> command:
>>>>>>
>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, folks,
>>>>>>>
>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>
>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>>
>>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>>> information in the term of local file system.
>>>>>>>
>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>>
>>>>>>>  With such info, is there a way to
>>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>>> level?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Demai on the run
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> *Stanley Shi,*
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley and all,

thanks. I will write a client application to explore this path. A quick
question again.
Using the fsck command, I can retrieve all the necessary info
$ hadoop fsck /tmp/list2.txt -files -blocks -racks
.....
 *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2
[/default/10.122.195.198:50010, /default/10.122.195.196:50010]

However, using getFileBlockLocations(), I can't get the block name/id info,
such as
*BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the
BlockLocation don't provide the public info here.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html

is there another entry point? somethinig fsck is using? thanks

Demai




On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:

> As far as I know, there's no combination of hadoop API can do that.
> You can easily get the location of the block (on which DN), but there's no
> way to get the local address of that block file.
>
>
>
> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Yehia,
>>
>> No problem at all. I really appreciate your willingness to help. Yeah.
>> now I am able to get such information through two steps, and the first step
>> will be either hadoop fsck or getFileBlockLocations(). and then search
>> the local filesystem, my cluster is using the default from CDH, which is
>> /dfs/dn
>>
>> I would like to it programmatically, so wondering whether someone already
>> done it? or maybe better a hadoop API call already implemented for this
>> exact purpose
>>
>> Demai
>>
>>
>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> Sorry, I missed that you are already tried this out. I think you can
>>> construct the block location on the local file system if you have the block
>>> pool id and the block id. If you are using cloudera distribution, the
>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>> dfs.datanode.data.dir configuration keys).
>>>
>>> Thanks
>>> Yehia
>>>
>>>
>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> You can use fsck utility like the following:
>>>>
>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>
>>>> This will display all the information you need about the blocks of your
>>>> file.
>>>>
>>>> Hope it helps.
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, Stanley,
>>>>>
>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>> 2) use local file system call(like find command) to match the block to
>>>>> files on local file system .
>>>>>
>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>
>>>>> Demai on the run
>>>>>
>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>
>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>
>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>
>>>>>
>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, folks,
>>>>>>
>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>
>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>
>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>> information in the term of local file system.
>>>>>>
>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>
>>>>>>  With such info, is there a way to
>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>> level?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Demai on the run
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> *Stanley Shi,*
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley and all,

thanks. I will write a client application to explore this path. A quick
question again.
Using the fsck command, I can retrieve all the necessary info
$ hadoop fsck /tmp/list2.txt -files -blocks -racks
.....
 *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2
[/default/10.122.195.198:50010, /default/10.122.195.196:50010]

However, using getFileBlockLocations(), I can't get the block name/id info,
such as
*BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the
BlockLocation don't provide the public info here.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html

is there another entry point? somethinig fsck is using? thanks

Demai




On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:

> As far as I know, there's no combination of hadoop API can do that.
> You can easily get the location of the block (on which DN), but there's no
> way to get the local address of that block file.
>
>
>
> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Yehia,
>>
>> No problem at all. I really appreciate your willingness to help. Yeah.
>> now I am able to get such information through two steps, and the first step
>> will be either hadoop fsck or getFileBlockLocations(). and then search
>> the local filesystem, my cluster is using the default from CDH, which is
>> /dfs/dn
>>
>> I would like to it programmatically, so wondering whether someone already
>> done it? or maybe better a hadoop API call already implemented for this
>> exact purpose
>>
>> Demai
>>
>>
>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> Sorry, I missed that you are already tried this out. I think you can
>>> construct the block location on the local file system if you have the block
>>> pool id and the block id. If you are using cloudera distribution, the
>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>> dfs.datanode.data.dir configuration keys).
>>>
>>> Thanks
>>> Yehia
>>>
>>>
>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> You can use fsck utility like the following:
>>>>
>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>
>>>> This will display all the information you need about the blocks of your
>>>> file.
>>>>
>>>> Hope it helps.
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, Stanley,
>>>>>
>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>> 2) use local file system call(like find command) to match the block to
>>>>> files on local file system .
>>>>>
>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>
>>>>> Demai on the run
>>>>>
>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>
>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>
>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>
>>>>>
>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, folks,
>>>>>>
>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>
>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>
>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>> information in the term of local file system.
>>>>>>
>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>
>>>>>>  With such info, is there a way to
>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>> level?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Demai on the run
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> *Stanley Shi,*
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley and all,

thanks. I will write a client application to explore this path. A quick
question again.
Using the fsck command, I can retrieve all the necessary info
$ hadoop fsck /tmp/list2.txt -files -blocks -racks
.....
 *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2
[/default/10.122.195.198:50010, /default/10.122.195.196:50010]

However, using getFileBlockLocations(), I can't get the block name/id info,
such as
*BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the
BlockLocation don't provide the public info here.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html

is there another entry point? somethinig fsck is using? thanks

Demai




On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:

> As far as I know, there's no combination of hadoop API can do that.
> You can easily get the location of the block (on which DN), but there's no
> way to get the local address of that block file.
>
>
>
> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Yehia,
>>
>> No problem at all. I really appreciate your willingness to help. Yeah.
>> now I am able to get such information through two steps, and the first step
>> will be either hadoop fsck or getFileBlockLocations(). and then search
>> the local filesystem, my cluster is using the default from CDH, which is
>> /dfs/dn
>>
>> I would like to it programmatically, so wondering whether someone already
>> done it? or maybe better a hadoop API call already implemented for this
>> exact purpose
>>
>> Demai
>>
>>
>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> Sorry, I missed that you are already tried this out. I think you can
>>> construct the block location on the local file system if you have the block
>>> pool id and the block id. If you are using cloudera distribution, the
>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>> dfs.datanode.data.dir configuration keys).
>>>
>>> Thanks
>>> Yehia
>>>
>>>
>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> You can use fsck utility like the following:
>>>>
>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>
>>>> This will display all the information you need about the blocks of your
>>>> file.
>>>>
>>>> Hope it helps.
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, Stanley,
>>>>>
>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>> 2) use local file system call(like find command) to match the block to
>>>>> files on local file system .
>>>>>
>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>
>>>>> Demai on the run
>>>>>
>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>
>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>
>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>
>>>>>
>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, folks,
>>>>>>
>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>
>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>
>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>> information in the term of local file system.
>>>>>>
>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>
>>>>>>  With such info, is there a way to
>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>> level?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Demai on the run
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> *Stanley Shi,*
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Stanley and all,

thanks. I will write a client application to explore this path. A quick
question again.
Using the fsck command, I can retrieve all the necessary info
$ hadoop fsck /tmp/list2.txt -files -blocks -racks
.....
 *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025* len=8 repl=2
[/default/10.122.195.198:50010, /default/10.122.195.196:50010]

However, using getFileBlockLocations(), I can't get the block name/id info,
such as
*BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025*seem the
BlockLocation don't provide the public info here.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html

is there another entry point? somethinig fsck is using? thanks

Demai




On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <ss...@pivotal.io> wrote:

> As far as I know, there's no combination of hadoop API can do that.
> You can easily get the location of the block (on which DN), but there's no
> way to get the local address of that block file.
>
>
>
> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Yehia,
>>
>> No problem at all. I really appreciate your willingness to help. Yeah.
>> now I am able to get such information through two steps, and the first step
>> will be either hadoop fsck or getFileBlockLocations(). and then search
>> the local filesystem, my cluster is using the default from CDH, which is
>> /dfs/dn
>>
>> I would like to it programmatically, so wondering whether someone already
>> done it? or maybe better a hadoop API call already implemented for this
>> exact purpose
>>
>> Demai
>>
>>
>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> Sorry, I missed that you are already tried this out. I think you can
>>> construct the block location on the local file system if you have the block
>>> pool id and the block id. If you are using cloudera distribution, the
>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>> dfs.datanode.data.dir configuration keys).
>>>
>>> Thanks
>>> Yehia
>>>
>>>
>>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> You can use fsck utility like the following:
>>>>
>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>
>>>> This will display all the information you need about the blocks of your
>>>> file.
>>>>
>>>> Hope it helps.
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, Stanley,
>>>>>
>>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>> 2) use local file system call(like find command) to match the block to
>>>>> files on local file system .
>>>>>
>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>
>>>>> Demai on the run
>>>>>
>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>>
>>>>> I am not sure this is what you want but you can try this shell command:
>>>>>
>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>
>>>>>
>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>>
>>>>>> Hi, folks,
>>>>>>
>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>
>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>>
>>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>>> information in the term of local file system.
>>>>>>
>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>
>>>>>>  With such info, is there a way to
>>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>>> level?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Demai on the run
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> *Stanley Shi,*
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

As far as I know, there's no combination of hadoop API can do that.
You can easily get the location of the block (on which DN), but there's no
way to get the local address of that block file.



On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:

> Yehia,
>
> No problem at all. I really appreciate your willingness to help. Yeah. now
> I am able to get such information through two steps, and the first step
> will be either hadoop fsck or getFileBlockLocations(). and then search
> the local filesystem, my cluster is using the default from CDH, which is
> /dfs/dn
>
> I would like to it programmatically, so wondering whether someone already
> done it? or maybe better a hadoop API call already implemented for this
> exact purpose
>
> Demai
>
>
> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> Sorry, I missed that you are already tried this out. I think you can
>> construct the block location on the local file system if you have the block
>> pool id and the block id. If you are using cloudera distribution, the
>> default location is under /dfs/dn ( the value of dfs.data.dir,
>> dfs.datanode.data.dir configuration keys).
>>
>> Thanks
>> Yehia
>>
>>
>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>
>>> Hi Demai,
>>>
>>> You can use fsck utility like the following:
>>>
>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>
>>> This will display all the information you need about the blocks of your
>>> file.
>>>
>>> Hope it helps.
>>> Yehia
>>>
>>>
>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, Stanley,
>>>>
>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>> 2) use local file system call(like find command) to match the block to
>>>> files on local file system .
>>>>
>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>
>>>> Demai on the run
>>>>
>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>
>>>> I am not sure this is what you want but you can try this shell command:
>>>>
>>>> find [DATANODE_DIR] -name [blockname]
>>>>
>>>>
>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, folks,
>>>>>
>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>
>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>
>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>> information in the term of local file system.
>>>>>
>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>
>>>>>  With such info, is there a way to
>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>> level?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Demai on the run
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

As far as I know, there's no combination of hadoop API can do that.
You can easily get the location of the block (on which DN), but there's no
way to get the local address of that block file.



On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:

> Yehia,
>
> No problem at all. I really appreciate your willingness to help. Yeah. now
> I am able to get such information through two steps, and the first step
> will be either hadoop fsck or getFileBlockLocations(). and then search
> the local filesystem, my cluster is using the default from CDH, which is
> /dfs/dn
>
> I would like to it programmatically, so wondering whether someone already
> done it? or maybe better a hadoop API call already implemented for this
> exact purpose
>
> Demai
>
>
> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> Sorry, I missed that you are already tried this out. I think you can
>> construct the block location on the local file system if you have the block
>> pool id and the block id. If you are using cloudera distribution, the
>> default location is under /dfs/dn ( the value of dfs.data.dir,
>> dfs.datanode.data.dir configuration keys).
>>
>> Thanks
>> Yehia
>>
>>
>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>
>>> Hi Demai,
>>>
>>> You can use fsck utility like the following:
>>>
>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>
>>> This will display all the information you need about the blocks of your
>>> file.
>>>
>>> Hope it helps.
>>> Yehia
>>>
>>>
>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, Stanley,
>>>>
>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>> 2) use local file system call(like find command) to match the block to
>>>> files on local file system .
>>>>
>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>
>>>> Demai on the run
>>>>
>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>
>>>> I am not sure this is what you want but you can try this shell command:
>>>>
>>>> find [DATANODE_DIR] -name [blockname]
>>>>
>>>>
>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, folks,
>>>>>
>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>
>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>
>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>> information in the term of local file system.
>>>>>
>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>
>>>>>  With such info, is there a way to
>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>> level?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Demai on the run
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

As far as I know, there's no combination of hadoop API can do that.
You can easily get the location of the block (on which DN), but there's no
way to get the local address of that block file.



On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:

> Yehia,
>
> No problem at all. I really appreciate your willingness to help. Yeah. now
> I am able to get such information through two steps, and the first step
> will be either hadoop fsck or getFileBlockLocations(). and then search
> the local filesystem, my cluster is using the default from CDH, which is
> /dfs/dn
>
> I would like to it programmatically, so wondering whether someone already
> done it? or maybe better a hadoop API call already implemented for this
> exact purpose
>
> Demai
>
>
> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> Sorry, I missed that you are already tried this out. I think you can
>> construct the block location on the local file system if you have the block
>> pool id and the block id. If you are using cloudera distribution, the
>> default location is under /dfs/dn ( the value of dfs.data.dir,
>> dfs.datanode.data.dir configuration keys).
>>
>> Thanks
>> Yehia
>>
>>
>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>
>>> Hi Demai,
>>>
>>> You can use fsck utility like the following:
>>>
>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>
>>> This will display all the information you need about the blocks of your
>>> file.
>>>
>>> Hope it helps.
>>> Yehia
>>>
>>>
>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, Stanley,
>>>>
>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>> 2) use local file system call(like find command) to match the block to
>>>> files on local file system .
>>>>
>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>
>>>> Demai on the run
>>>>
>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>
>>>> I am not sure this is what you want but you can try this shell command:
>>>>
>>>> find [DATANODE_DIR] -name [blockname]
>>>>
>>>>
>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, folks,
>>>>>
>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>
>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>
>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>> information in the term of local file system.
>>>>>
>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>
>>>>>  With such info, is there a way to
>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>> level?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Demai on the run
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

As far as I know, there's no combination of hadoop API can do that.
You can easily get the location of the block (on which DN), but there's no
way to get the local address of that block file.



On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <ni...@gmail.com> wrote:

> Yehia,
>
> No problem at all. I really appreciate your willingness to help. Yeah. now
> I am able to get such information through two steps, and the first step
> will be either hadoop fsck or getFileBlockLocations(). and then search
> the local filesystem, my cluster is using the default from CDH, which is
> /dfs/dn
>
> I would like to it programmatically, so wondering whether someone already
> done it? or maybe better a hadoop API call already implemented for this
> exact purpose
>
> Demai
>
>
> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
> wrote:
>
>> Hi Demai,
>>
>> Sorry, I missed that you are already tried this out. I think you can
>> construct the block location on the local file system if you have the block
>> pool id and the block id. If you are using cloudera distribution, the
>> default location is under /dfs/dn ( the value of dfs.data.dir,
>> dfs.datanode.data.dir configuration keys).
>>
>> Thanks
>> Yehia
>>
>>
>> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>>
>>> Hi Demai,
>>>
>>> You can use fsck utility like the following:
>>>
>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>
>>> This will display all the information you need about the blocks of your
>>> file.
>>>
>>> Hope it helps.
>>> Yehia
>>>
>>>
>>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, Stanley,
>>>>
>>>> Many thanks. Your method works. For now, I can have two steps approach:
>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>> 2) use local file system call(like find command) to match the block to
>>>> files on local file system .
>>>>
>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>
>>>> Demai on the run
>>>>
>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>>
>>>> I am not sure this is what you want but you can try this shell command:
>>>>
>>>> find [DATANODE_DIR] -name [blockname]
>>>>
>>>>
>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>>
>>>>> Hi, folks,
>>>>>
>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>
>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>>
>>>>> I am wondering whether there is a interface to get each hdfs block
>>>>> information in the term of local file system.
>>>>>
>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>
>>>>>  With such info, is there a way to
>>>>> 1) login to hfds01, and read the block directly at local file system
>>>>> level?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Demai on the run
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>
>>
>


-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Yehia,

No problem at all. I really appreciate your willingness to help. Yeah. now
I am able to get such information through two steps, and the first step
will be either hadoop fsck or getFileBlockLocations(). and then search the
local filesystem, my cluster is using the default from CDH, which is /dfs/dn

I would like to it programmatically, so wondering whether someone already
done it? or maybe better a hadoop API call already implemented for this
exact purpose

Demai


On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
wrote:

> Hi Demai,
>
> Sorry, I missed that you are already tried this out. I think you can
> construct the block location on the local file system if you have the block
> pool id and the block id. If you are using cloudera distribution, the
> default location is under /dfs/dn ( the value of dfs.data.dir,
> dfs.datanode.data.dir configuration keys).
>
> Thanks
> Yehia
>
>
> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>
>> Hi Demai,
>>
>> You can use fsck utility like the following:
>>
>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>
>> This will display all the information you need about the blocks of your
>> file.
>>
>> Hope it helps.
>> Yehia
>>
>>
>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, Stanley,
>>>
>>> Many thanks. Your method works. For now, I can have two steps approach:
>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>> 2) use local file system call(like find command) to match the block to
>>> files on local file system .
>>>
>>> Maybe there is an existing Hadoop API to return such info in already?
>>>
>>> Demai on the run
>>>
>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>
>>> I am not sure this is what you want but you can try this shell command:
>>>
>>> find [DATANODE_DIR] -name [blockname]
>>>
>>>
>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, folks,
>>>>
>>>> New in this area. Hopefully to get a couple pointers.
>>>>
>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>
>>>> I am wondering whether there is a interface to get each hdfs block
>>>> information in the term of local file system.
>>>>
>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>
>>>>  With such info, is there a way to
>>>> 1) login to hfds01, and read the block directly at local file system
>>>> level?
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Demai on the run
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Yehia,

No problem at all. I really appreciate your willingness to help. Yeah. now
I am able to get such information through two steps, and the first step
will be either hadoop fsck or getFileBlockLocations(). and then search the
local filesystem, my cluster is using the default from CDH, which is /dfs/dn

I would like to it programmatically, so wondering whether someone already
done it? or maybe better a hadoop API call already implemented for this
exact purpose

Demai


On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
wrote:

> Hi Demai,
>
> Sorry, I missed that you are already tried this out. I think you can
> construct the block location on the local file system if you have the block
> pool id and the block id. If you are using cloudera distribution, the
> default location is under /dfs/dn ( the value of dfs.data.dir,
> dfs.datanode.data.dir configuration keys).
>
> Thanks
> Yehia
>
>
> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>
>> Hi Demai,
>>
>> You can use fsck utility like the following:
>>
>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>
>> This will display all the information you need about the blocks of your
>> file.
>>
>> Hope it helps.
>> Yehia
>>
>>
>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, Stanley,
>>>
>>> Many thanks. Your method works. For now, I can have two steps approach:
>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>> 2) use local file system call(like find command) to match the block to
>>> files on local file system .
>>>
>>> Maybe there is an existing Hadoop API to return such info in already?
>>>
>>> Demai on the run
>>>
>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>
>>> I am not sure this is what you want but you can try this shell command:
>>>
>>> find [DATANODE_DIR] -name [blockname]
>>>
>>>
>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, folks,
>>>>
>>>> New in this area. Hopefully to get a couple pointers.
>>>>
>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>
>>>> I am wondering whether there is a interface to get each hdfs block
>>>> information in the term of local file system.
>>>>
>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>
>>>>  With such info, is there a way to
>>>> 1) login to hfds01, and read the block directly at local file system
>>>> level?
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Demai on the run
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Yehia,

No problem at all. I really appreciate your willingness to help. Yeah. now
I am able to get such information through two steps, and the first step
will be either hadoop fsck or getFileBlockLocations(). and then search the
local filesystem, my cluster is using the default from CDH, which is /dfs/dn

I would like to it programmatically, so wondering whether someone already
done it? or maybe better a hadoop API call already implemented for this
exact purpose

Demai


On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
wrote:

> Hi Demai,
>
> Sorry, I missed that you are already tried this out. I think you can
> construct the block location on the local file system if you have the block
> pool id and the block id. If you are using cloudera distribution, the
> default location is under /dfs/dn ( the value of dfs.data.dir,
> dfs.datanode.data.dir configuration keys).
>
> Thanks
> Yehia
>
>
> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>
>> Hi Demai,
>>
>> You can use fsck utility like the following:
>>
>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>
>> This will display all the information you need about the blocks of your
>> file.
>>
>> Hope it helps.
>> Yehia
>>
>>
>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, Stanley,
>>>
>>> Many thanks. Your method works. For now, I can have two steps approach:
>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>> 2) use local file system call(like find command) to match the block to
>>> files on local file system .
>>>
>>> Maybe there is an existing Hadoop API to return such info in already?
>>>
>>> Demai on the run
>>>
>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>
>>> I am not sure this is what you want but you can try this shell command:
>>>
>>> find [DATANODE_DIR] -name [blockname]
>>>
>>>
>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, folks,
>>>>
>>>> New in this area. Hopefully to get a couple pointers.
>>>>
>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>
>>>> I am wondering whether there is a interface to get each hdfs block
>>>> information in the term of local file system.
>>>>
>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>
>>>>  With such info, is there a way to
>>>> 1) login to hfds01, and read the block directly at local file system
>>>> level?
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Demai on the run
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Yehia,

No problem at all. I really appreciate your willingness to help. Yeah. now
I am able to get such information through two steps, and the first step
will be either hadoop fsck or getFileBlockLocations(). and then search the
local filesystem, my cluster is using the default from CDH, which is /dfs/dn

I would like to it programmatically, so wondering whether someone already
done it? or maybe better a hadoop API call already implemented for this
exact purpose

Demai


On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y....@gmail.com>
wrote:

> Hi Demai,
>
> Sorry, I missed that you are already tried this out. I think you can
> construct the block location on the local file system if you have the block
> pool id and the block id. If you are using cloudera distribution, the
> default location is under /dfs/dn ( the value of dfs.data.dir,
> dfs.datanode.data.dir configuration keys).
>
> Thanks
> Yehia
>
>
> On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:
>
>> Hi Demai,
>>
>> You can use fsck utility like the following:
>>
>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>
>> This will display all the information you need about the blocks of your
>> file.
>>
>> Hope it helps.
>> Yehia
>>
>>
>> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, Stanley,
>>>
>>> Many thanks. Your method works. For now, I can have two steps approach:
>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>> 2) use local file system call(like find command) to match the block to
>>> files on local file system .
>>>
>>> Maybe there is an existing Hadoop API to return such info in already?
>>>
>>> Demai on the run
>>>
>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>>
>>> I am not sure this is what you want but you can try this shell command:
>>>
>>> find [DATANODE_DIR] -name [blockname]
>>>
>>>
>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>>
>>>> Hi, folks,
>>>>
>>>> New in this area. Hopefully to get a couple pointers.
>>>>
>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>>
>>>> I am wondering whether there is a interface to get each hdfs block
>>>> information in the term of local file system.
>>>>
>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>> -racks" to get blockID and its replica on the nodes, such as: repl =3[
>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>
>>>>  With such info, is there a way to
>>>> 1) login to hfds01, and read the block directly at local file system
>>>> level?
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Demai on the run
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

Sorry, I missed that you are already tried this out. I think you can
construct the block location on the local file system if you have the block
pool id and the block id. If you are using cloudera distribution, the
default location is under /dfs/dn ( the value of dfs.data.dir,
dfs.datanode.data.dir configuration keys).

Thanks
Yehia


On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:

> Hi Demai,
>
> You can use fsck utility like the following:
>
> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>
> This will display all the information you need about the blocks of your
> file.
>
> Hope it helps.
> Yehia
>
>
> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, Stanley,
>>
>> Many thanks. Your method works. For now, I can have two steps approach:
>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>> 2) use local file system call(like find command) to match the block to
>> files on local file system .
>>
>> Maybe there is an existing Hadoop API to return such info in already?
>>
>> Demai on the run
>>
>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>
>> I am not sure this is what you want but you can try this shell command:
>>
>> find [DATANODE_DIR] -name [blockname]
>>
>>
>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, folks,
>>>
>>> New in this area. Hopefully to get a couple pointers.
>>>
>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>
>>> I am wondering whether there is a interface to get each hdfs block
>>> information in the term of local file system.
>>>
>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>>> to get blockID and its replica on the nodes, such as: repl =3[
>>> /rack/hdfs01, /rack/hdfs02...]
>>>
>>>  With such info, is there a way to
>>> 1) login to hfds01, and read the block directly at local file system
>>> level?
>>>
>>>
>>> Thanks
>>>
>>> Demai on the run
>>
>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

Sorry, I missed that you are already tried this out. I think you can
construct the block location on the local file system if you have the block
pool id and the block id. If you are using cloudera distribution, the
default location is under /dfs/dn ( the value of dfs.data.dir,
dfs.datanode.data.dir configuration keys).

Thanks
Yehia


On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:

> Hi Demai,
>
> You can use fsck utility like the following:
>
> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>
> This will display all the information you need about the blocks of your
> file.
>
> Hope it helps.
> Yehia
>
>
> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, Stanley,
>>
>> Many thanks. Your method works. For now, I can have two steps approach:
>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>> 2) use local file system call(like find command) to match the block to
>> files on local file system .
>>
>> Maybe there is an existing Hadoop API to return such info in already?
>>
>> Demai on the run
>>
>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>
>> I am not sure this is what you want but you can try this shell command:
>>
>> find [DATANODE_DIR] -name [blockname]
>>
>>
>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, folks,
>>>
>>> New in this area. Hopefully to get a couple pointers.
>>>
>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>
>>> I am wondering whether there is a interface to get each hdfs block
>>> information in the term of local file system.
>>>
>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>>> to get blockID and its replica on the nodes, such as: repl =3[
>>> /rack/hdfs01, /rack/hdfs02...]
>>>
>>>  With such info, is there a way to
>>> 1) login to hfds01, and read the block directly at local file system
>>> level?
>>>
>>>
>>> Thanks
>>>
>>> Demai on the run
>>
>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

Sorry, I missed that you are already tried this out. I think you can
construct the block location on the local file system if you have the block
pool id and the block id. If you are using cloudera distribution, the
default location is under /dfs/dn ( the value of dfs.data.dir,
dfs.datanode.data.dir configuration keys).

Thanks
Yehia


On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:

> Hi Demai,
>
> You can use fsck utility like the following:
>
> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>
> This will display all the information you need about the blocks of your
> file.
>
> Hope it helps.
> Yehia
>
>
> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, Stanley,
>>
>> Many thanks. Your method works. For now, I can have two steps approach:
>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>> 2) use local file system call(like find command) to match the block to
>> files on local file system .
>>
>> Maybe there is an existing Hadoop API to return such info in already?
>>
>> Demai on the run
>>
>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>
>> I am not sure this is what you want but you can try this shell command:
>>
>> find [DATANODE_DIR] -name [blockname]
>>
>>
>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, folks,
>>>
>>> New in this area. Hopefully to get a couple pointers.
>>>
>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>
>>> I am wondering whether there is a interface to get each hdfs block
>>> information in the term of local file system.
>>>
>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>>> to get blockID and its replica on the nodes, such as: repl =3[
>>> /rack/hdfs01, /rack/hdfs02...]
>>>
>>>  With such info, is there a way to
>>> 1) login to hfds01, and read the block directly at local file system
>>> level?
>>>
>>>
>>> Thanks
>>>
>>> Demai on the run
>>
>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

Sorry, I missed that you are already tried this out. I think you can
construct the block location on the local file system if you have the block
pool id and the block id. If you are using cloudera distribution, the
default location is under /dfs/dn ( the value of dfs.data.dir,
dfs.datanode.data.dir configuration keys).

Thanks
Yehia


On 27 August 2014 21:20, Yehia Elshater <y....@gmail.com> wrote:

> Hi Demai,
>
> You can use fsck utility like the following:
>
> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>
> This will display all the information you need about the blocks of your
> file.
>
> Hope it helps.
> Yehia
>
>
> On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, Stanley,
>>
>> Many thanks. Your method works. For now, I can have two steps approach:
>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>> 2) use local file system call(like find command) to match the block to
>> files on local file system .
>>
>> Maybe there is an existing Hadoop API to return such info in already?
>>
>> Demai on the run
>>
>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>>
>> I am not sure this is what you want but you can try this shell command:
>>
>> find [DATANODE_DIR] -name [blockname]
>>
>>
>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>>
>>> Hi, folks,
>>>
>>> New in this area. Hopefully to get a couple pointers.
>>>
>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>>
>>> I am wondering whether there is a interface to get each hdfs block
>>> information in the term of local file system.
>>>
>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>>> to get blockID and its replica on the nodes, such as: repl =3[
>>> /rack/hdfs01, /rack/hdfs02...]
>>>
>>>  With such info, is there a way to
>>> 1) login to hfds01, and read the block directly at local file system
>>> level?
>>>
>>>
>>> Thanks
>>>
>>> Demai on the run
>>
>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

You can use fsck utility like the following:

hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks

This will display all the information you need about the blocks of your
file.

Hope it helps.
Yehia


On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:

> Hi, Stanley,
>
> Many thanks. Your method works. For now, I can have two steps approach:
> 1) getFileBlockLocations to grab hdfs BlockLocation[]
> 2) use local file system call(like find command) to match the block to
> files on local file system .
>
> Maybe there is an existing Hadoop API to return such info in already?
>
> Demai on the run
>
> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
> I am not sure this is what you want but you can try this shell command:
>
> find [DATANODE_DIR] -name [blockname]
>
>
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, folks,
>>
>> New in this area. Hopefully to get a couple pointers.
>>
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>
>> I am wondering whether there is a interface to get each hdfs block
>> information in the term of local file system.
>>
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>> to get blockID and its replica on the nodes, such as: repl =3[
>> /rack/hdfs01, /rack/hdfs02...]
>>
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system
>> level?
>>
>>
>> Thanks
>>
>> Demai on the run
>
>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

You can use fsck utility like the following:

hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks

This will display all the information you need about the blocks of your
file.

Hope it helps.
Yehia


On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:

> Hi, Stanley,
>
> Many thanks. Your method works. For now, I can have two steps approach:
> 1) getFileBlockLocations to grab hdfs BlockLocation[]
> 2) use local file system call(like find command) to match the block to
> files on local file system .
>
> Maybe there is an existing Hadoop API to return such info in already?
>
> Demai on the run
>
> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
> I am not sure this is what you want but you can try this shell command:
>
> find [DATANODE_DIR] -name [blockname]
>
>
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, folks,
>>
>> New in this area. Hopefully to get a couple pointers.
>>
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>
>> I am wondering whether there is a interface to get each hdfs block
>> information in the term of local file system.
>>
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>> to get blockID and its replica on the nodes, such as: repl =3[
>> /rack/hdfs01, /rack/hdfs02...]
>>
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system
>> level?
>>
>>
>> Thanks
>>
>> Demai on the run
>
>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

You can use fsck utility like the following:

hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks

This will display all the information you need about the blocks of your
file.

Hope it helps.
Yehia


On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:

> Hi, Stanley,
>
> Many thanks. Your method works. For now, I can have two steps approach:
> 1) getFileBlockLocations to grab hdfs BlockLocation[]
> 2) use local file system call(like find command) to match the block to
> files on local file system .
>
> Maybe there is an existing Hadoop API to return such info in already?
>
> Demai on the run
>
> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
> I am not sure this is what you want but you can try this shell command:
>
> find [DATANODE_DIR] -name [blockname]
>
>
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, folks,
>>
>> New in this area. Hopefully to get a couple pointers.
>>
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>
>> I am wondering whether there is a interface to get each hdfs block
>> information in the term of local file system.
>>
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>> to get blockID and its replica on the nodes, such as: repl =3[
>> /rack/hdfs01, /rack/hdfs02...]
>>
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system
>> level?
>>
>>
>> Thanks
>>
>> Demai on the run
>
>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Yehia Elshater <y....@gmail.com>.

Hi Demai,

You can use fsck utility like the following:

hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks

This will display all the information you need about the blocks of your
file.

Hope it helps.
Yehia


On 27 August 2014 20:18, Demai Ni <ni...@gmail.com> wrote:

> Hi, Stanley,
>
> Many thanks. Your method works. For now, I can have two steps approach:
> 1) getFileBlockLocations to grab hdfs BlockLocation[]
> 2) use local file system call(like find command) to match the block to
> files on local file system .
>
> Maybe there is an existing Hadoop API to return such info in already?
>
> Demai on the run
>
> On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:
>
> I am not sure this is what you want but you can try this shell command:
>
> find [DATANODE_DIR] -name [blockname]
>
>
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>
>> Hi, folks,
>>
>> New in this area. Hopefully to get a couple pointers.
>>
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>>
>> I am wondering whether there is a interface to get each hdfs block
>> information in the term of local file system.
>>
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
>> to get blockID and its replica on the nodes, such as: repl =3[
>> /rack/hdfs01, /rack/hdfs02...]
>>
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system
>> level?
>>
>>
>> Thanks
>>
>> Demai on the run
>
>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Hi, Stanley,

Many thanks. Your method works. For now, I can have two steps approach:
1) getFileBlockLocations to grab hdfs BlockLocation[]
2) use local file system call(like find command) to match the block to files on local file system .

Maybe there is an existing Hadoop API to return such info in already?

Demai on the run

On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:

> I am not sure this is what you want but you can try this shell command:
> 
> find [DATANODE_DIR] -name [blockname]
> 
> 
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>> Hi, folks,
>> 
>> New in this area. Hopefully to get a couple pointers.
>> 
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>> 
>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>> 
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>> 
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system level?
>> 
>> 
>> Thanks
>> 
>> Demai on the run
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Hi, Stanley,

Many thanks. Your method works. For now, I can have two steps approach:
1) getFileBlockLocations to grab hdfs BlockLocation[]
2) use local file system call(like find command) to match the block to files on local file system .

Maybe there is an existing Hadoop API to return such info in already?

Demai on the run

On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:

> I am not sure this is what you want but you can try this shell command:
> 
> find [DATANODE_DIR] -name [blockname]
> 
> 
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>> Hi, folks,
>> 
>> New in this area. Hopefully to get a couple pointers.
>> 
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>> 
>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>> 
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>> 
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system level?
>> 
>> 
>> Thanks
>> 
>> Demai on the run
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Hi, Stanley,

Many thanks. Your method works. For now, I can have two steps approach:
1) getFileBlockLocations to grab hdfs BlockLocation[]
2) use local file system call(like find command) to match the block to files on local file system .

Maybe there is an existing Hadoop API to return such info in already?

Demai on the run

On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:

> I am not sure this is what you want but you can try this shell command:
> 
> find [DATANODE_DIR] -name [blockname]
> 
> 
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>> Hi, folks,
>> 
>> New in this area. Hopefully to get a couple pointers.
>> 
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>> 
>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>> 
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>> 
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system level?
>> 
>> 
>> Thanks
>> 
>> Demai on the run
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Demai Ni <ni...@gmail.com>.

Hi, Stanley,

Many thanks. Your method works. For now, I can have two steps approach:
1) getFileBlockLocations to grab hdfs BlockLocation[]
2) use local file system call(like find command) to match the block to files on local file system .

Maybe there is an existing Hadoop API to return such info in already?

Demai on the run

On Aug 26, 2014, at 9:14 PM, Stanley Shi <ss...@pivotal.io> wrote:

> I am not sure this is what you want but you can try this shell command:
> 
> find [DATANODE_DIR] -name [blockname]
> 
> 
> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:
>> Hi, folks,
>> 
>> New in this area. Hopefully to get a couple pointers.
>> 
>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>> 
>> I am wondering whether there is a interface to get each hdfs block information in the term of local file system.
>> 
>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01, /rack/hdfs02...]
>> 
>>  With such info, is there a way to
>> 1) login to hfds01, and read the block directly at local file system level?
>> 
>> 
>> Thanks
>> 
>> Demai on the run
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
>

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

I am not sure this is what you want but you can try this shell command:

find [DATANODE_DIR] -name [blockname]


On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:

> Hi, folks,
>
> New in this area. Hopefully to get a couple pointers.
>
> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>
> I am wondering whether there is a interface to get each hdfs block
> information in the term of local file system.
>
> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
> to get blockID and its replica on the nodes, such as: repl =3[
> /rack/hdfs01, /rack/hdfs02...]
>
>  With such info, is there a way to
> 1) login to hfds01, and read the block directly at local file system level?
>
>
> Thanks
>
> Demai on the run




-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

I am not sure this is what you want but you can try this shell command:

find [DATANODE_DIR] -name [blockname]


On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:

> Hi, folks,
>
> New in this area. Hopefully to get a couple pointers.
>
> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>
> I am wondering whether there is a interface to get each hdfs block
> information in the term of local file system.
>
> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
> to get blockID and its replica on the nodes, such as: repl =3[
> /rack/hdfs01, /rack/hdfs02...]
>
>  With such info, is there a way to
> 1) login to hfds01, and read the block directly at local file system level?
>
>
> Thanks
>
> Demai on the run




-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

I am not sure this is what you want but you can try this shell command:

find [DATANODE_DIR] -name [blockname]


On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:

> Hi, folks,
>
> New in this area. Hopefully to get a couple pointers.
>
> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>
> I am wondering whether there is a interface to get each hdfs block
> information in the term of local file system.
>
> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
> to get blockID and its replica on the nodes, such as: repl =3[
> /rack/hdfs01, /rack/hdfs02...]
>
>  With such info, is there a way to
> 1) login to hfds01, and read the block directly at local file system level?
>
>
> Thanks
>
> Demai on the run




-- 
Regards,
*Stanley Shi,*

Re: Local file system to access hdfs blocks

Posted by Stanley Shi <ss...@pivotal.io>.

I am not sure this is what you want but you can try this shell command:

find [DATANODE_DIR] -name [blockname]


On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <ni...@gmail.com> wrote:

> Hi, folks,
>
> New in this area. Hopefully to get a couple pointers.
>
> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)
>
> I am wondering whether there is a interface to get each hdfs block
> information in the term of local file system.
>
> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks"
> to get blockID and its replica on the nodes, such as: repl =3[
> /rack/hdfs01, /rack/hdfs02...]
>
>  With such info, is there a way to
> 1) login to hfds01, and read the block directly at local file system level?
>
>
> Thanks
>
> Demai on the run




-- 
Regards,
*Stanley Shi,*