You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Igor Bogomolov <ig...@gmail.com> on 2015/02/23 19:30:40 UTC

tracking remote reads in datanode logs

Hi all,

In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want to
know how many remote map tasks (ones that read input data from remote
nodes) there are in a mapreduce job. For this purpose I took logs of each
datanode an looked for lines with "op: HDFS_READ" and cliID field that
contains map task id.

Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
Another 1 has many lines with "op: HDFS_READ" but all cliID look like
DFSClient_NONMAPREDUCE_* and does not contain any map task id.

I concluded there are no remote map tasks but that does not look correct.
Also even local reads are not logged (because there is no line where cliID
field contains some map task id). Could anyone please explain what's wrong?
Why logging is not working? (I use default settings).

Chris,

Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062> that
you have implemented. Thought you might have an explanation.

Best,
Igor

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> The AM logs are in the Hdfs if you set log aggregation property.
> Otherwise, they are in the container log directory. See this:
> http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
>
> Thanks
>
> 2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:
>
> Hi Drake,
>>
>> Thanks for a pointer. AM log indeed have information about remote map
>> tasks. But I'd like to have more low level details. Like on which node each
>> map task was scheduled and how many bytes was read. That should be exactly
>> in datanode log and I saw it for another job. But after I reinstall the
>> cluster it's not there anymore :(
>>
>> Could you please tell the path where AM log is located (from which you
>> copied the lines)? I found it in web interface but not as file on a disk.
>> And nothing in /var/log/hadoop-*
>>
>> Thanks,
>> Igor
>>
>> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:
>>
>>> I found this in the mapreduce am log.
>>>
>>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>>> HostLocal:0 RackLocal:0
>>> ..
>>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>>> HostLocal:3 RackLocal:2
>>> ..
>>>
>>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>>> before.
>>>
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>>>
>>>> Hi, Igor
>>>>
>>>> Did you look at the mapreduce application master log? I think the local
>>>> or rack local map tasks are logged in the MapReduce AM log.
>>>>
>>>> Good luck.
>>>>
>>>> Drake 민영근 Ph.D
>>>> kt NexR
>>>>
>>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>>> igor.bogomolov@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I
>>>>> want to know how many remote map tasks (ones that read input data from
>>>>> remote nodes) there are in a mapreduce job. For this purpose I took logs of
>>>>> each datanode an looked for lines with "op: HDFS_READ" and cliID
>>>>> field that contains map task id.
>>>>>
>>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>>
>>>>> I concluded there are no remote map tasks but that does not look
>>>>> correct. Also even local reads are not logged (because there is no line
>>>>> where cliID field contains some map task id). Could anyone please
>>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>>
>>>>> Chris,
>>>>>
>>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>>> that you have implemented. Thought you might have an explanation.
>>>>>
>>>>> Best,
>>>>> Igor
>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> Drake 민영근 Ph.D
> kt NexR
>
>

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> The AM logs are in the Hdfs if you set log aggregation property.
> Otherwise, they are in the container log directory. See this:
> http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
>
> Thanks
>
> 2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:
>
> Hi Drake,
>>
>> Thanks for a pointer. AM log indeed have information about remote map
>> tasks. But I'd like to have more low level details. Like on which node each
>> map task was scheduled and how many bytes was read. That should be exactly
>> in datanode log and I saw it for another job. But after I reinstall the
>> cluster it's not there anymore :(
>>
>> Could you please tell the path where AM log is located (from which you
>> copied the lines)? I found it in web interface but not as file on a disk.
>> And nothing in /var/log/hadoop-*
>>
>> Thanks,
>> Igor
>>
>> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:
>>
>>> I found this in the mapreduce am log.
>>>
>>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>>> HostLocal:0 RackLocal:0
>>> ..
>>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>>> HostLocal:3 RackLocal:2
>>> ..
>>>
>>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>>> before.
>>>
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>>>
>>>> Hi, Igor
>>>>
>>>> Did you look at the mapreduce application master log? I think the local
>>>> or rack local map tasks are logged in the MapReduce AM log.
>>>>
>>>> Good luck.
>>>>
>>>> Drake 민영근 Ph.D
>>>> kt NexR
>>>>
>>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>>> igor.bogomolov@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I
>>>>> want to know how many remote map tasks (ones that read input data from
>>>>> remote nodes) there are in a mapreduce job. For this purpose I took logs of
>>>>> each datanode an looked for lines with "op: HDFS_READ" and cliID
>>>>> field that contains map task id.
>>>>>
>>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>>
>>>>> I concluded there are no remote map tasks but that does not look
>>>>> correct. Also even local reads are not logged (because there is no line
>>>>> where cliID field contains some map task id). Could anyone please
>>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>>
>>>>> Chris,
>>>>>
>>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>>> that you have implemented. Thought you might have an explanation.
>>>>>
>>>>> Best,
>>>>> Igor
>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> Drake 민영근 Ph.D
> kt NexR
>
>

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> The AM logs are in the Hdfs if you set log aggregation property.
> Otherwise, they are in the container log directory. See this:
> http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
>
> Thanks
>
> 2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:
>
> Hi Drake,
>>
>> Thanks for a pointer. AM log indeed have information about remote map
>> tasks. But I'd like to have more low level details. Like on which node each
>> map task was scheduled and how many bytes was read. That should be exactly
>> in datanode log and I saw it for another job. But after I reinstall the
>> cluster it's not there anymore :(
>>
>> Could you please tell the path where AM log is located (from which you
>> copied the lines)? I found it in web interface but not as file on a disk.
>> And nothing in /var/log/hadoop-*
>>
>> Thanks,
>> Igor
>>
>> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:
>>
>>> I found this in the mapreduce am log.
>>>
>>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>>> HostLocal:0 RackLocal:0
>>> ..
>>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>>> HostLocal:3 RackLocal:2
>>> ..
>>>
>>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>>> before.
>>>
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>>>
>>>> Hi, Igor
>>>>
>>>> Did you look at the mapreduce application master log? I think the local
>>>> or rack local map tasks are logged in the MapReduce AM log.
>>>>
>>>> Good luck.
>>>>
>>>> Drake 민영근 Ph.D
>>>> kt NexR
>>>>
>>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>>> igor.bogomolov@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I
>>>>> want to know how many remote map tasks (ones that read input data from
>>>>> remote nodes) there are in a mapreduce job. For this purpose I took logs of
>>>>> each datanode an looked for lines with "op: HDFS_READ" and cliID
>>>>> field that contains map task id.
>>>>>
>>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>>
>>>>> I concluded there are no remote map tasks but that does not look
>>>>> correct. Also even local reads are not logged (because there is no line
>>>>> where cliID field contains some map task id). Could anyone please
>>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>>
>>>>> Chris,
>>>>>
>>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>>> that you have implemented. Thought you might have an explanation.
>>>>>
>>>>> Best,
>>>>> Igor
>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> Drake 민영근 Ph.D
> kt NexR
>
>

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> The AM logs are in the Hdfs if you set log aggregation property.
> Otherwise, they are in the container log directory. See this:
> http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
>
> Thanks
>
> 2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:
>
> Hi Drake,
>>
>> Thanks for a pointer. AM log indeed have information about remote map
>> tasks. But I'd like to have more low level details. Like on which node each
>> map task was scheduled and how many bytes was read. That should be exactly
>> in datanode log and I saw it for another job. But after I reinstall the
>> cluster it's not there anymore :(
>>
>> Could you please tell the path where AM log is located (from which you
>> copied the lines)? I found it in web interface but not as file on a disk.
>> And nothing in /var/log/hadoop-*
>>
>> Thanks,
>> Igor
>>
>> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:
>>
>>> I found this in the mapreduce am log.
>>>
>>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>>> HostLocal:0 RackLocal:0
>>> ..
>>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>>> HostLocal:3 RackLocal:2
>>> ..
>>>
>>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>>> before.
>>>
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>>>
>>>> Hi, Igor
>>>>
>>>> Did you look at the mapreduce application master log? I think the local
>>>> or rack local map tasks are logged in the MapReduce AM log.
>>>>
>>>> Good luck.
>>>>
>>>> Drake 민영근 Ph.D
>>>> kt NexR
>>>>
>>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>>> igor.bogomolov@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I
>>>>> want to know how many remote map tasks (ones that read input data from
>>>>> remote nodes) there are in a mapreduce job. For this purpose I took logs of
>>>>> each datanode an looked for lines with "op: HDFS_READ" and cliID
>>>>> field that contains map task id.
>>>>>
>>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>>
>>>>> I concluded there are no remote map tasks but that does not look
>>>>> correct. Also even local reads are not logged (because there is no line
>>>>> where cliID field contains some map task id). Could anyone please
>>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>>
>>>>> Chris,
>>>>>
>>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>>> that you have implemented. Thought you might have an explanation.
>>>>>
>>>>> Best,
>>>>> Igor
>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> Drake 민영근 Ph.D
> kt NexR
>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

The AM logs are in the Hdfs if you set log aggregation property. Otherwise,
they are in the container log directory. See this:
http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

Thanks

2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:

> Hi Drake,
>
> Thanks for a pointer. AM log indeed have information about remote map
> tasks. But I'd like to have more low level details. Like on which node each
> map task was scheduled and how many bytes was read. That should be exactly
> in datanode log and I saw it for another job. But after I reinstall the
> cluster it's not there anymore :(
>
> Could you please tell the path where AM log is located (from which you
> copied the lines)? I found it in web interface but not as file on a disk.
> And nothing in /var/log/hadoop-*
>
> Thanks,
> Igor
>
> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <drake.min@nexr.com
> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>
>> I found this in the mapreduce am log.
>>
>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>> HostLocal:0 RackLocal:0
>> ..
>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>> HostLocal:3 RackLocal:2
>> ..
>>
>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>> before.
>>
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <drake.min@nexr.com
>> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>>
>>> Hi, Igor
>>>
>>> Did you look at the mapreduce application master log? I think the local
>>> or rack local map tasks are logged in the MapReduce AM log.
>>>
>>> Good luck.
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>> igor.bogomolov@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','igor.bogomolov@gmail.com');>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>>> to know how many remote map tasks (ones that read input data from remote
>>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>>> contains map task id.
>>>>
>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>
>>>> I concluded there are no remote map tasks but that does not look
>>>> correct. Also even local reads are not logged (because there is no line
>>>> where cliID field contains some map task id). Could anyone please
>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>
>>>> Chris,
>>>>
>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>> that you have implemented. Thought you might have an explanation.
>>>>
>>>> Best,
>>>> Igor
>>>>
>>>>
>>>
>>
>

-- 
Drake 민영근 Ph.D
kt NexR

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

The AM logs are in the Hdfs if you set log aggregation property. Otherwise,
they are in the container log directory. See this:
http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

Thanks

2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:

> Hi Drake,
>
> Thanks for a pointer. AM log indeed have information about remote map
> tasks. But I'd like to have more low level details. Like on which node each
> map task was scheduled and how many bytes was read. That should be exactly
> in datanode log and I saw it for another job. But after I reinstall the
> cluster it's not there anymore :(
>
> Could you please tell the path where AM log is located (from which you
> copied the lines)? I found it in web interface but not as file on a disk.
> And nothing in /var/log/hadoop-*
>
> Thanks,
> Igor
>
> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <drake.min@nexr.com
> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>
>> I found this in the mapreduce am log.
>>
>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>> HostLocal:0 RackLocal:0
>> ..
>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>> HostLocal:3 RackLocal:2
>> ..
>>
>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>> before.
>>
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <drake.min@nexr.com
>> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>>
>>> Hi, Igor
>>>
>>> Did you look at the mapreduce application master log? I think the local
>>> or rack local map tasks are logged in the MapReduce AM log.
>>>
>>> Good luck.
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>> igor.bogomolov@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','igor.bogomolov@gmail.com');>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>>> to know how many remote map tasks (ones that read input data from remote
>>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>>> contains map task id.
>>>>
>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>
>>>> I concluded there are no remote map tasks but that does not look
>>>> correct. Also even local reads are not logged (because there is no line
>>>> where cliID field contains some map task id). Could anyone please
>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>
>>>> Chris,
>>>>
>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>> that you have implemented. Thought you might have an explanation.
>>>>
>>>> Best,
>>>> Igor
>>>>
>>>>
>>>
>>
>

-- 
Drake 민영근 Ph.D
kt NexR

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

The AM logs are in the Hdfs if you set log aggregation property. Otherwise,
they are in the container log directory. See this:
http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

Thanks

2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:

> Hi Drake,
>
> Thanks for a pointer. AM log indeed have information about remote map
> tasks. But I'd like to have more low level details. Like on which node each
> map task was scheduled and how many bytes was read. That should be exactly
> in datanode log and I saw it for another job. But after I reinstall the
> cluster it's not there anymore :(
>
> Could you please tell the path where AM log is located (from which you
> copied the lines)? I found it in web interface but not as file on a disk.
> And nothing in /var/log/hadoop-*
>
> Thanks,
> Igor
>
> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <drake.min@nexr.com
> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>
>> I found this in the mapreduce am log.
>>
>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>> HostLocal:0 RackLocal:0
>> ..
>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>> HostLocal:3 RackLocal:2
>> ..
>>
>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>> before.
>>
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <drake.min@nexr.com
>> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>>
>>> Hi, Igor
>>>
>>> Did you look at the mapreduce application master log? I think the local
>>> or rack local map tasks are logged in the MapReduce AM log.
>>>
>>> Good luck.
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>> igor.bogomolov@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','igor.bogomolov@gmail.com');>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>>> to know how many remote map tasks (ones that read input data from remote
>>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>>> contains map task id.
>>>>
>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>
>>>> I concluded there are no remote map tasks but that does not look
>>>> correct. Also even local reads are not logged (because there is no line
>>>> where cliID field contains some map task id). Could anyone please
>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>
>>>> Chris,
>>>>
>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>> that you have implemented. Thought you might have an explanation.
>>>>
>>>> Best,
>>>> Igor
>>>>
>>>>
>>>
>>
>

-- 
Drake 민영근 Ph.D
kt NexR

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

The AM logs are in the Hdfs if you set log aggregation property. Otherwise,
they are in the container log directory. See this:
http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

Thanks

2015년 2월 25일 수요일, Igor Bogomolov<ig...@gmail.com>님이 작성한 메시지:

> Hi Drake,
>
> Thanks for a pointer. AM log indeed have information about remote map
> tasks. But I'd like to have more low level details. Like on which node each
> map task was scheduled and how many bytes was read. That should be exactly
> in datanode log and I saw it for another job. But after I reinstall the
> cluster it's not there anymore :(
>
> Could you please tell the path where AM log is located (from which you
> copied the lines)? I found it in web interface but not as file on a disk.
> And nothing in /var/log/hadoop-*
>
> Thanks,
> Igor
>
> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <drake.min@nexr.com
> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>
>> I found this in the mapreduce am log.
>>
>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>> HostLocal:0 RackLocal:0
>> ..
>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>> HostLocal:3 RackLocal:2
>> ..
>>
>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>> before.
>>
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <drake.min@nexr.com
>> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>>
>>> Hi, Igor
>>>
>>> Did you look at the mapreduce application master log? I think the local
>>> or rack local map tasks are logged in the MapReduce AM log.
>>>
>>> Good luck.
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>> igor.bogomolov@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','igor.bogomolov@gmail.com');>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>>> to know how many remote map tasks (ones that read input data from remote
>>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>>> contains map task id.
>>>>
>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>
>>>> I concluded there are no remote map tasks but that does not look
>>>> correct. Also even local reads are not logged (because there is no line
>>>> where cliID field contains some map task id). Could anyone please
>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>
>>>> Chris,
>>>>
>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>> that you have implemented. Thought you might have an explanation.
>>>>
>>>> Best,
>>>> Igor
>>>>
>>>>
>>>
>>
>

-- 
Drake 민영근 Ph.D
kt NexR

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Hi Drake,

Thanks for a pointer. AM log indeed have information about remote map
tasks. But I'd like to have more low level details. Like on which node each
map task was scheduled and how many bytes was read. That should be exactly
in datanode log and I saw it for another job. But after I reinstall the
cluster it's not there anymore :(

Could you please tell the path where AM log is located (from which you
copied the lines)? I found it in web interface but not as file on a disk.
And nothing in /var/log/hadoop-*

Thanks,
Igor

On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:

> I found this in the mapreduce am log.
>
> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
> HostLocal:0 RackLocal:0
> ..
> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
> HostLocal:3 RackLocal:2
> ..
>
> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
> before.
>
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>
>> Hi, Igor
>>
>> Did you look at the mapreduce application master log? I think the local
>> or rack local map tasks are logged in the MapReduce AM log.
>>
>> Good luck.
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <igor.bogomolov@gmail.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>> to know how many remote map tasks (ones that read input data from remote
>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>> contains map task id.
>>>
>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>
>>> I concluded there are no remote map tasks but that does not look
>>> correct. Also even local reads are not logged (because there is no line
>>> where cliID field contains some map task id). Could anyone please
>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>
>>> Chris,
>>>
>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>> that you have implemented. Thought you might have an explanation.
>>>
>>> Best,
>>> Igor
>>>
>>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Hi Drake,

Thanks for a pointer. AM log indeed have information about remote map
tasks. But I'd like to have more low level details. Like on which node each
map task was scheduled and how many bytes was read. That should be exactly
in datanode log and I saw it for another job. But after I reinstall the
cluster it's not there anymore :(

Could you please tell the path where AM log is located (from which you
copied the lines)? I found it in web interface but not as file on a disk.
And nothing in /var/log/hadoop-*

Thanks,
Igor

On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:

> I found this in the mapreduce am log.
>
> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
> HostLocal:0 RackLocal:0
> ..
> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
> HostLocal:3 RackLocal:2
> ..
>
> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
> before.
>
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>
>> Hi, Igor
>>
>> Did you look at the mapreduce application master log? I think the local
>> or rack local map tasks are logged in the MapReduce AM log.
>>
>> Good luck.
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <igor.bogomolov@gmail.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>> to know how many remote map tasks (ones that read input data from remote
>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>> contains map task id.
>>>
>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>
>>> I concluded there are no remote map tasks but that does not look
>>> correct. Also even local reads are not logged (because there is no line
>>> where cliID field contains some map task id). Could anyone please
>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>
>>> Chris,
>>>
>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>> that you have implemented. Thought you might have an explanation.
>>>
>>> Best,
>>> Igor
>>>
>>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Hi Drake,

Thanks for a pointer. AM log indeed have information about remote map
tasks. But I'd like to have more low level details. Like on which node each
map task was scheduled and how many bytes was read. That should be exactly
in datanode log and I saw it for another job. But after I reinstall the
cluster it's not there anymore :(

Could you please tell the path where AM log is located (from which you
copied the lines)? I found it in web interface but not as file on a disk.
And nothing in /var/log/hadoop-*

Thanks,
Igor

On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:

> I found this in the mapreduce am log.
>
> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
> HostLocal:0 RackLocal:0
> ..
> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
> HostLocal:3 RackLocal:2
> ..
>
> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
> before.
>
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>
>> Hi, Igor
>>
>> Did you look at the mapreduce application master log? I think the local
>> or rack local map tasks are logged in the MapReduce AM log.
>>
>> Good luck.
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <igor.bogomolov@gmail.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>> to know how many remote map tasks (ones that read input data from remote
>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>> contains map task id.
>>>
>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>
>>> I concluded there are no remote map tasks but that does not look
>>> correct. Also even local reads are not logged (because there is no line
>>> where cliID field contains some map task id). Could anyone please
>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>
>>> Chris,
>>>
>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>> that you have implemented. Thought you might have an explanation.
>>>
>>> Best,
>>> Igor
>>>
>>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Igor Bogomolov <ig...@gmail.com>.
Hi Drake,

Thanks for a pointer. AM log indeed have information about remote map
tasks. But I'd like to have more low level details. Like on which node each
map task was scheduled and how many bytes was read. That should be exactly
in datanode log and I saw it for another job. But after I reinstall the
cluster it's not there anymore :(

Could you please tell the path where AM log is located (from which you
copied the lines)? I found it in web interface but not as file on a disk.
And nothing in /var/log/hadoop-*

Thanks,
Igor

On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <dr...@nexr.com> wrote:

> I found this in the mapreduce am log.
>
> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
> HostLocal:0 RackLocal:0
> ..
> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
> HostLocal:3 RackLocal:2
> ..
>
> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
> before.
>
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:
>
>> Hi, Igor
>>
>> Did you look at the mapreduce application master log? I think the local
>> or rack local map tasks are logged in the MapReduce AM log.
>>
>> Good luck.
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <igor.bogomolov@gmail.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>> to know how many remote map tasks (ones that read input data from remote
>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>> contains map task id.
>>>
>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>
>>> I concluded there are no remote map tasks but that does not look
>>> correct. Also even local reads are not logged (because there is no line
>>> where cliID field contains some map task id). Could anyone please
>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>
>>> Chris,
>>>
>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>> that you have implemented. Thought you might have an explanation.
>>>
>>> Best,
>>> Igor
>>>
>>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
I found this in the mapreduce am log.

2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
HostLocal:0 RackLocal:0
..
2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
HostLocal:3 RackLocal:2
..

The first line says Map tasks are 5 and second says HostLocal 3 and Rack
Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
before.


Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> Did you look at the mapreduce application master log? I think the local or
> rack local map tasks are logged in the MapReduce AM log.
>
> Good luck.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>> to know how many remote map tasks (ones that read input data from remote
>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>> contains map task id.
>>
>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>
>> I concluded there are no remote map tasks but that does not look correct.
>> Also even local reads are not logged (because there is no line where
>> cliID field contains some map task id). Could anyone please explain
>> what's wrong? Why logging is not working? (I use default settings).
>>
>> Chris,
>>
>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>> that you have implemented. Thought you might have an explanation.
>>
>> Best,
>> Igor
>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
I found this in the mapreduce am log.

2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
HostLocal:0 RackLocal:0
..
2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
HostLocal:3 RackLocal:2
..

The first line says Map tasks are 5 and second says HostLocal 3 and Rack
Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
before.


Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> Did you look at the mapreduce application master log? I think the local or
> rack local map tasks are logged in the MapReduce AM log.
>
> Good luck.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>> to know how many remote map tasks (ones that read input data from remote
>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>> contains map task id.
>>
>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>
>> I concluded there are no remote map tasks but that does not look correct.
>> Also even local reads are not logged (because there is no line where
>> cliID field contains some map task id). Could anyone please explain
>> what's wrong? Why logging is not working? (I use default settings).
>>
>> Chris,
>>
>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>> that you have implemented. Thought you might have an explanation.
>>
>> Best,
>> Igor
>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
I found this in the mapreduce am log.

2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
HostLocal:0 RackLocal:0
..
2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
HostLocal:3 RackLocal:2
..

The first line says Map tasks are 5 and second says HostLocal 3 and Rack
Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
before.


Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> Did you look at the mapreduce application master log? I think the local or
> rack local map tasks are logged in the MapReduce AM log.
>
> Good luck.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>> to know how many remote map tasks (ones that read input data from remote
>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>> contains map task id.
>>
>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>
>> I concluded there are no remote map tasks but that does not look correct.
>> Also even local reads are not logged (because there is no line where
>> cliID field contains some map task id). Could anyone please explain
>> what's wrong? Why logging is not working? (I use default settings).
>>
>> Chris,
>>
>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>> that you have implemented. Thought you might have an explanation.
>>
>> Best,
>> Igor
>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
I found this in the mapreduce am log.

2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
HostLocal:0 RackLocal:0
..
2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
HostLocal:3 RackLocal:2
..

The first line says Map tasks are 5 and second says HostLocal 3 and Rack
Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
before.


Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi, Igor
>
> Did you look at the mapreduce application master log? I think the local or
> rack local map tasks are logged in the MapReduce AM log.
>
> Good luck.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>> to know how many remote map tasks (ones that read input data from remote
>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>> contains map task id.
>>
>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>
>> I concluded there are no remote map tasks but that does not look correct.
>> Also even local reads are not logged (because there is no line where
>> cliID field contains some map task id). Could anyone please explain
>> what's wrong? Why logging is not working? (I use default settings).
>>
>> Chris,
>>
>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>> that you have implemented. Thought you might have an explanation.
>>
>> Best,
>> Igor
>>
>>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

Did you look at the mapreduce application master log? I think the local or
rack local map tasks are logged in the MapReduce AM log.

Good luck.

Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
wrote:

> Hi all,
>
> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want to
> know how many remote map tasks (ones that read input data from remote
> nodes) there are in a mapreduce job. For this purpose I took logs of each
> datanode an looked for lines with "op: HDFS_READ" and cliID field that
> contains map task id.
>
> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>
> I concluded there are no remote map tasks but that does not look correct.
> Also even local reads are not logged (because there is no line where cliID
> field contains some map task id). Could anyone please explain what's wrong?
> Why logging is not working? (I use default settings).
>
> Chris,
>
> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
> that you have implemented. Thought you might have an explanation.
>
> Best,
> Igor
>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

Did you look at the mapreduce application master log? I think the local or
rack local map tasks are logged in the MapReduce AM log.

Good luck.

Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
wrote:

> Hi all,
>
> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want to
> know how many remote map tasks (ones that read input data from remote
> nodes) there are in a mapreduce job. For this purpose I took logs of each
> datanode an looked for lines with "op: HDFS_READ" and cliID field that
> contains map task id.
>
> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>
> I concluded there are no remote map tasks but that does not look correct.
> Also even local reads are not logged (because there is no line where cliID
> field contains some map task id). Could anyone please explain what's wrong?
> Why logging is not working? (I use default settings).
>
> Chris,
>
> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
> that you have implemented. Thought you might have an explanation.
>
> Best,
> Igor
>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

Did you look at the mapreduce application master log? I think the local or
rack local map tasks are logged in the MapReduce AM log.

Good luck.

Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
wrote:

> Hi all,
>
> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want to
> know how many remote map tasks (ones that read input data from remote
> nodes) there are in a mapreduce job. For this purpose I took logs of each
> datanode an looked for lines with "op: HDFS_READ" and cliID field that
> contains map task id.
>
> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>
> I concluded there are no remote map tasks but that does not look correct.
> Also even local reads are not logged (because there is no line where cliID
> field contains some map task id). Could anyone please explain what's wrong?
> Why logging is not working? (I use default settings).
>
> Chris,
>
> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
> that you have implemented. Thought you might have an explanation.
>
> Best,
> Igor
>
>

Re: tracking remote reads in datanode logs

Posted by Drake민영근 <dr...@nexr.com>.
Hi, Igor

Did you look at the mapreduce application master log? I think the local or
rack local map tasks are logged in the MapReduce AM log.

Good luck.

Drake 민영근 Ph.D
kt NexR

On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <ig...@gmail.com>
wrote:

> Hi all,
>
> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want to
> know how many remote map tasks (ones that read input data from remote
> nodes) there are in a mapreduce job. For this purpose I took logs of each
> datanode an looked for lines with "op: HDFS_READ" and cliID field that
> contains map task id.
>
> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>
> I concluded there are no remote map tasks but that does not look correct.
> Also even local reads are not logged (because there is no line where cliID
> field contains some map task id). Could anyone please explain what's wrong?
> Why logging is not working? (I use default settings).
>
> Chris,
>
> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
> that you have implemented. Thought you might have an explanation.
>
> Best,
> Igor
>
>