You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Ravi Shetye <ra...@gmail.com> on 2013/06/12 09:17:48 UTC

Task Tracker going down on hive cluster

In last 4-5 of day the task tracker on one of my slave machines has gone
down couple of time. It has been working fine from the past 4-5 months

The cluster configuration is
4 machine cluster on AWS
1 m2.xlarge master
3 m2.xlarge slaves

The cluster is dedicated to run hive queries, with the data residing on s3.

the slave on which the task tracker went down had the following log

*******************************************************************
2013-06-11 00:26:30,968 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005693_0, duration: 279198
2013-06-11 00:26:30,971 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 193135
2013-06-11 00:26:30,971 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 192011
2013-06-11 00:26:30,972 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005693_0, duration: 178209
2013-06-11 00:26:30,973 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005694_0, duration: 186452
2013-06-11 00:26:30,973 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005694_0, duration: 157360
2013-06-11 00:26:30,974 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 157555
2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
killed jvm_201306071409_0151_m_-435659475 but just removed
2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
it ran: 0
2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
Throwable in JVMRunner. Aborting TaskTracker.
org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:265)
at java.io.PrintWriter.close(PrintWriter.java:312)
at
org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
at
org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
Caused by: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:297)
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
... 13 more
2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
2013-06-11 00:26:31,008 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005694_0, duration: 222430
2013-06-11 00:26:31,008 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005693_0, duration: 154027
2013-06-11 00:26:31,008 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 132067
2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201306071409_0151_m_-495709221 spawned.
2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
Writing commands to
/mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
2013-06-11 00:26:31,331 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 437236
2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
************************************************************/

-- 
RAVI SHETYE

Re: Task Tracker going down on hive cluster

Posted by Ravi Shetye <ra...@gmail.com>.
restarting hadoop by start-all.sh brought the cluster back to working
condition.
I do not think there is persistent any network change.
Checking with AWS folks if there was a temporary failure


On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Broken Pipe is a network related issue usually. Have you verified no
> change in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>>  at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>>  at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>>  at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>>  ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>>
>
>


-- 
RAVI SHETYE

Re: Task Tracker going down on hive cluster

Posted by Ravi Shetye <ra...@gmail.com>.
restarting hadoop by start-all.sh brought the cluster back to working
condition.
I do not think there is persistent any network change.
Checking with AWS folks if there was a temporary failure


On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Broken Pipe is a network related issue usually. Have you verified no
> change in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>>  at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>>  at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>>  at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>>  ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>>
>
>


-- 
RAVI SHETYE

Re: Task Tracker going down on hive cluster

Posted by Ravi Shetye <ra...@gmail.com>.
restarting hadoop by start-all.sh brought the cluster back to working
condition.
I do not think there is persistent any network change.
Checking with AWS folks if there was a temporary failure


On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Broken Pipe is a network related issue usually. Have you verified no
> change in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>>  at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>>  at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>>  at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>>  ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>>
>
>


-- 
RAVI SHETYE

Re: Task Tracker going down on hive cluster

Posted by Harsh J <ha...@cloudera.com>.
Is /mnt/app/ an NFS?

On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Broken Pipe is a network related issue usually. Have you verified no change
> in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>> at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>> at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>> at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>> at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>> at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>> at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>> ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>
>



-- 
Harsh J

Re: Task Tracker going down on hive cluster

Posted by Harsh J <ha...@cloudera.com>.
Is /mnt/app/ an NFS?

On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Broken Pipe is a network related issue usually. Have you verified no change
> in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>> at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>> at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>> at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>> at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>> at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>> at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>> ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>
>



-- 
Harsh J

Re: Task Tracker going down on hive cluster

Posted by Harsh J <ha...@cloudera.com>.
Is /mnt/app/ an NFS?

On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Broken Pipe is a network related issue usually. Have you verified no change
> in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>> at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>> at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>> at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>> at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>> at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>> at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>> ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>
>



-- 
Harsh J

Re: Task Tracker going down on hive cluster

Posted by Harsh J <ha...@cloudera.com>.
Is /mnt/app/ an NFS?

On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com> wrote:
> Broken Pipe is a network related issue usually. Have you verified no change
> in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>> at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>> at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>> at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>> at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>> at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>> at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>> ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>
>



-- 
Harsh J

Re: Task Tracker going down on hive cluster

Posted by Ravi Shetye <ra...@gmail.com>.
restarting hadoop by start-all.sh brought the cluster back to working
condition.
I do not think there is persistent any network change.
Checking with AWS folks if there was a temporary failure


On Wed, Jun 12, 2013 at 6:20 PM, Shahab Yunus <sh...@gmail.com>wrote:

> Broken Pipe is a network related issue usually. Have you verified no
> change in network connectivity?
>
> Regards,
> Shahab
>
>
> On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:
>
>> In last 4-5 of day the task tracker on one of my slave machines has gone
>> down couple of time. It has been working fine from the past 4-5 months
>>
>> The cluster configuration is
>> 4 machine cluster on AWS
>> 1 m2.xlarge master
>> 3 m2.xlarge slaves
>>
>> The cluster is dedicated to run hive queries, with the data residing on
>> s3.
>>
>> the slave on which the task tracker went down had the following log
>>
>> *******************************************************************
>> 2013-06-11 00:26:30,968 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 279198
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 193135
>> 2013-06-11 00:26:30,971 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 192011
>> 2013-06-11 00:26:30,972 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 178209
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 186452
>> 2013-06-11 00:26:30,973 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 157360
>> 2013-06-11 00:26:30,974 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 157555
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
>> killed jvm_201306071409_0151_m_-435659475 but just removed
>> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
>> it ran: 0
>> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
>> Throwable in JVMRunner. Aborting TaskTracker.
>> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
>> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
>> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>>  at java.io.PrintWriter.close(PrintWriter.java:312)
>> at
>> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>>  at
>> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
>> at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>>  at
>> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
>> Caused by: java.io.IOException: Broken pipe
>> at java.io.FileOutputStream.writeBytes(Native Method)
>>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>>  ... 13 more
>> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005694_0, duration: 222430
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005693_0, duration: 154027
>> 2013-06-11 00:26:31,008 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 132067
>> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201306071409_0151_m_-495709221 spawned.
>> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
>> Writing commands to
>> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
>> 2013-06-11 00:26:31,331 INFO
>> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
>> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
>> attempt_201306071409_0151_m_005700_0, duration: 437236
>> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
>> ************************************************************/
>>
>> --
>> RAVI SHETYE
>>
>
>


-- 
RAVI SHETYE

Re: Task Tracker going down on hive cluster

Posted by Shahab Yunus <sh...@gmail.com>.
Broken Pipe is a network related issue usually. Have you verified no change
in network connectivity?

Regards,
Shahab


On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:

> In last 4-5 of day the task tracker on one of my slave machines has gone
> down couple of time. It has been working fine from the past 4-5 months
>
> The cluster configuration is
> 4 machine cluster on AWS
> 1 m2.xlarge master
> 3 m2.xlarge slaves
>
> The cluster is dedicated to run hive queries, with the data residing on s3.
>
> the slave on which the task tracker went down had the following log
>
> *******************************************************************
> 2013-06-11 00:26:30,968 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 279198
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 193135
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 192011
> 2013-06-11 00:26:30,972 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 178209
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 186452
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 157360
> 2013-06-11 00:26:30,974 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 157555
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
> killed jvm_201306071409_0151_m_-435659475 but just removed
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
> it ran: 0
> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>  at java.io.PrintWriter.close(PrintWriter.java:312)
> at
> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>  at
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>  at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
> Caused by: java.io.IOException: Broken pipe
> at java.io.FileOutputStream.writeBytes(Native Method)
>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>  ... 13 more
> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 222430
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 154027
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 132067
> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201306071409_0151_m_-495709221 spawned.
> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
> 2013-06-11 00:26:31,331 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 437236
> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
> ************************************************************/
>
> --
> RAVI SHETYE
>

Re: Task Tracker going down on hive cluster

Posted by Shahab Yunus <sh...@gmail.com>.
Broken Pipe is a network related issue usually. Have you verified no change
in network connectivity?

Regards,
Shahab


On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:

> In last 4-5 of day the task tracker on one of my slave machines has gone
> down couple of time. It has been working fine from the past 4-5 months
>
> The cluster configuration is
> 4 machine cluster on AWS
> 1 m2.xlarge master
> 3 m2.xlarge slaves
>
> The cluster is dedicated to run hive queries, with the data residing on s3.
>
> the slave on which the task tracker went down had the following log
>
> *******************************************************************
> 2013-06-11 00:26:30,968 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 279198
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 193135
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 192011
> 2013-06-11 00:26:30,972 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 178209
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 186452
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 157360
> 2013-06-11 00:26:30,974 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 157555
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
> killed jvm_201306071409_0151_m_-435659475 but just removed
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
> it ran: 0
> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>  at java.io.PrintWriter.close(PrintWriter.java:312)
> at
> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>  at
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>  at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
> Caused by: java.io.IOException: Broken pipe
> at java.io.FileOutputStream.writeBytes(Native Method)
>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>  ... 13 more
> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 222430
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 154027
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 132067
> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201306071409_0151_m_-495709221 spawned.
> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
> 2013-06-11 00:26:31,331 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 437236
> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
> ************************************************************/
>
> --
> RAVI SHETYE
>

Re: Task Tracker going down on hive cluster

Posted by Rob Roland <ro...@simplymeasured.com>.
I'm not an expert in this, but I do see a Broken pipe writing to a local
file system on your task tracker. Is it possible that you're out of disk
space, or your EBS volume is failing? S3 doesn't appear to be part of that
stack trace.

On Wednesday, June 12, 2013, Ravi Shetye wrote:

> In last 4-5 of day the task tracker on one of my slave machines has gone
> down couple of time. It has been working fine from the past 4-5 months
>
> The cluster configuration is
> 4 machine cluster on AWS
> 1 m2.xlarge master
> 3 m2.xlarge slaves
>
> The cluster is dedicated to run hive queries, with the data residing on s3.
>
> the slave on which the task tracker went down had the following log
>
> *******************************************************************
> 2013-06-11 00:26:30,968 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 279198
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 193135
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 192011
> 2013-06-11 00:26:30,972 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 178209
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 186452
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 157360
> 2013-06-11 00:26:30,974 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 157555
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
> killed jvm_201306071409_0151_m_-435659475 but just removed
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
> it ran: 0
> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>  at java.io.PrintWriter.close(PrintWriter.java:312)
> at
> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>  at
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>  at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
> Caused by: java.io.IOException: Broken pipe
> at java.io.FileOutputStream.writeBytes(Native Method)
>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>  ... 13 more
> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 222430
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 154027
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 132067
> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201306071409_0151_m_-495709221 spawned.
> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
> 2013-06-11 00:26:31,331 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 437236
> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
> ************************************************************/
>
> --
> RAVI SHETYE
>

Re: Task Tracker going down on hive cluster

Posted by Shahab Yunus <sh...@gmail.com>.
Broken Pipe is a network related issue usually. Have you verified no change
in network connectivity?

Regards,
Shahab


On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:

> In last 4-5 of day the task tracker on one of my slave machines has gone
> down couple of time. It has been working fine from the past 4-5 months
>
> The cluster configuration is
> 4 machine cluster on AWS
> 1 m2.xlarge master
> 3 m2.xlarge slaves
>
> The cluster is dedicated to run hive queries, with the data residing on s3.
>
> the slave on which the task tracker went down had the following log
>
> *******************************************************************
> 2013-06-11 00:26:30,968 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 279198
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 193135
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 192011
> 2013-06-11 00:26:30,972 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 178209
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 186452
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 157360
> 2013-06-11 00:26:30,974 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 157555
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
> killed jvm_201306071409_0151_m_-435659475 but just removed
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
> it ran: 0
> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>  at java.io.PrintWriter.close(PrintWriter.java:312)
> at
> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>  at
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>  at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
> Caused by: java.io.IOException: Broken pipe
> at java.io.FileOutputStream.writeBytes(Native Method)
>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>  ... 13 more
> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 222430
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 154027
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 132067
> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201306071409_0151_m_-495709221 spawned.
> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
> 2013-06-11 00:26:31,331 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 437236
> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
> ************************************************************/
>
> --
> RAVI SHETYE
>

Re: Task Tracker going down on hive cluster

Posted by Shahab Yunus <sh...@gmail.com>.
Broken Pipe is a network related issue usually. Have you verified no change
in network connectivity?

Regards,
Shahab


On Wed, Jun 12, 2013 at 3:17 AM, Ravi Shetye <ra...@gmail.com> wrote:

> In last 4-5 of day the task tracker on one of my slave machines has gone
> down couple of time. It has been working fine from the past 4-5 months
>
> The cluster configuration is
> 4 machine cluster on AWS
> 1 m2.xlarge master
> 3 m2.xlarge slaves
>
> The cluster is dedicated to run hive queries, with the data residing on s3.
>
> the slave on which the task tracker went down had the following log
>
> *******************************************************************
> 2013-06-11 00:26:30,968 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 279198
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 193135
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 192011
> 2013-06-11 00:26:30,972 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 178209
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 186452
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 157360
> 2013-06-11 00:26:30,974 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 157555
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
> killed jvm_201306071409_0151_m_-435659475 but just removed
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
> it ran: 0
> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>  at java.io.PrintWriter.close(PrintWriter.java:312)
> at
> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>  at
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>  at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
> Caused by: java.io.IOException: Broken pipe
> at java.io.FileOutputStream.writeBytes(Native Method)
>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>  ... 13 more
> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 222430
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 154027
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 132067
> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201306071409_0151_m_-495709221 spawned.
> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
> 2013-06-11 00:26:31,331 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 437236
> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
> ************************************************************/
>
> --
> RAVI SHETYE
>