You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Irfan Mohammed <ir...@gmail.com> on 2009/09/08 16:58:13 UTC
OutOfMemory Errors when loading a Gzip file
Hi,
I am trying to load a large gzip file and process using pig. Everytime I
run the following script, I get outofmemory errors.
The hadoop-site.xml is attached. The pig and the hadoop jobtracker logs
are attached as well.
$ pig
>>>
x1 = LOAD 'file:///mnt/transaction_ar20090907_1102_126.CSV.gz' using
PigStorage('\u0002');
y1 = LIMIT x1 10;
dump y1;
>>>
Environment :
hadoop-0.20.0
pig-0.3.0 [ patched with Pig-660-4 to work with hadoop-0.20.0 ]
ec2 [ c1.medium ]
Thanks,
Irfan
Re: OutOfMemory Errors when loading a Gzip file
Posted by prasenjit mukherjee <pr...@gmail.com>.
I am also trying to grapple with the similar class of problems. In
my case the files are unzipped ( and hence assuming is splitable on
record boundaries) . The record size is pretty small, though the total
number of records could be in 100s of millions.
Would like to know how pig splits files for LOAD/STORE operations
specifically. Does the fact that pig is instructed to use local file
(LOAD file:///.... ) make any difference ?
-Thanks,
Prasen
On Tue, Sep 8, 2009 at 10:58 AM, Irfan Mohammed<ir...@gmail.com> wrote:
> Hi,
> I am trying to load a large gzip file and process using pig. Everytime I run
> the following script, I get outofmemory errors.
>
> The hadoop-site.xml is attached. The pig and the hadoop jobtracker logs are
> attached as well.
>
> $ pig
>>>>
> x1 = LOAD 'file:///mnt/transaction_ar20090907_1102_126.CSV.gz' using
> PigStorage('\u0002');
> y1 = LIMIT x1 10;
> dump y1;
>>>>
>
> Environment :
> hadoop-0.20.0
> pig-0.3.0 [ patched with Pig-660-4 to work with hadoop-0.20.0 ]
> ec2 [ c1.medium ]
>
> Thanks,
> Irfan
>
>
Re: OutOfMemory Errors when loading a Gzip file
Posted by Alan Gates <ga...@yahoo-inc.com>.
How large are the records in your file? Do you expect any single
record to be in the multi-megabyte size?
Have you tried decompressing the file and reading it to see if the
issue is the compression?
Alan.
On Sep 8, 2009, at 7:58 AM, Irfan Mohammed wrote:
> Hi,
> I am trying to load a large gzip file and process using pig.
> Everytime I run the following script, I get outofmemory errors.
>
> The hadoop-site.xml is attached. The pig and the hadoop jobtracker
> logs are attached as well.
>
> $ pig
> >>>
> x1 = LOAD 'file:///mnt/transaction_ar20090907_1102_126.CSV.gz' using
> PigStorage('\u0002');
> y1 = LIMIT x1 10;
> dump y1;
> >>>
>
> Environment :
> hadoop-0.20.0
> pig-0.3.0 [ patched with Pig-660-4 to work with hadoop-0.20.0 ]
> ec2 [ c1.medium ]
>
> Thanks,
> Irfan
>
> ERROR 6016: Out of memory.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:
> Unable to open iterator for alias y1
> at org.apache.pig.PigServer.openIterator(PigServer.java:469)
> at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:
> 522)
> at
> org
> .apache
> .pig
> .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
> 190)
> at
> org
> .apache
> .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at
> org
> .apache
> .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> at org.apache.pig.Main.main(Main.java:350)
> Caused by: org.apache.pig.backend.executionengine.ExecException:
> ERROR 6016: Out of memory.
> at java.util.Arrays.copyOf(Arrays.java:2760)
> at java.util.Arrays.copyOf(Arrays.java:2734)
> at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> at java.util.ArrayList.add(ArrayList.java:351)
> at
> org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:
> 117)
> at
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:
> 104)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:162)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:138)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:
> 356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> Caused by: java.lang.OutOfMemoryError
>
> 2009-09-08 10:06:13,551 WARN org.apache.hadoop.conf.Configuration:
> DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-
> site.xml is deprecated. Instead use core-site.xml, mapred-site.xml
> and hdfs-site.xml to override properties of core-default.xml, mapred-
> default.xml and hdfs-default.xml respectively
> 2009-09-08 10:06:13,581 INFO org.apache.hadoop.mapred.JobTracker:
> STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting JobTracker
> STARTUP_MSG: host = domU-12-31-39-07-50-C2.compute-1.internal/
> 10.209.83.48
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.20.0
> STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20
> -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009
> ************************************************************/
> 2009-09-08 10:06:13,815 INFO
> org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics
> with hostName=JobTracker, port=50002
> 2009-09-08 10:06:13,908 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2009-09-08 10:06:14,027 INFO org.apache.hadoop.http.HttpServer:
> Jetty bound to port 50030
> 2009-09-08 10:06:14,027 INFO org.mortbay.log: jetty-6.1.14
> 2009-09-08 10:06:16,821 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0
> :50030
> 2009-09-08 10:06:16,822 INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
> with processName=JobTracker, sessionId=
> 2009-09-08 10:06:16,872 INFO org.apache.hadoop.mapred.JobTracker:
> JobTracker up at: 50002
> 2009-09-08 10:06:16,872 INFO org.apache.hadoop.mapred.JobTracker:
> JobTracker webserver: 50030
> 2009-09-08 10:06:17,045 INFO org.apache.hadoop.mapred.JobTracker:
> Cleaning up the system directory
> 2009-09-08 10:06:17,274 INFO org.apache.hadoop.ipc.Server: IPC
> Server Responder: starting
> 2009-09-08 10:06:17,275 INFO org.apache.hadoop.ipc.Server: IPC
> Server listener on 50002: starting
> 2009-09-08 10:06:17,275 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 50002: starting
> 2009-09-08 10:06:17,275 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 1 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 2 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 3 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 4 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 5 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 6 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 7 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 8 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.mapred.JobTracker:
> Starting RUNNING
> 2009-09-08 10:06:17,277 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 9 on 50002: starting
> 2009-09-08 10:06:17,716 INFO org.apache.hadoop.net.NetworkTopology:
> Adding a new node: /default-rack/domU-12-31-39-07-50-
> C2.compute-1.internal
> 2009-09-08 10:13:03,008 INFO
> org.apache.hadoop.mapred.EagerTaskInitializationListener:
> Initializing job_200909081006_0001
> 2009-09-08 10:13:03,013 INFO org.apache.hadoop.mapred.JobHistory:
> Nothing to recover! Generating a new filename domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144
> .jar for job job_200909081006_0001
> 2009-09-08 10:13:03,026 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144
> .jar doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144
> .jar.recover for recovery.
> 2009-09-08 10:13:04,432 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144
> .jar doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144
> .jar.recover as the master history file for user.
> 2009-09-08 10:13:05,194 INFO org.apache.hadoop.mapred.JobInProgress:
> Input size for job job_200909081006_0001 = 151
> 2009-09-08 10:13:05,194 INFO org.apache.hadoop.mapred.JobInProgress:
> Split info for job:job_200909081006_0001 with 1 splits:
> 2009-09-08 10:13:05,195 INFO org.apache.hadoop.net.NetworkTopology:
> Adding a new node: /default-rack/localhost
> 2009-09-08 10:13:05,195 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_200909081006_0001_m_000000 has split on node:/default-rack/
> localhost
> 2009-09-08 10:13:06,253 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0001_m_000002_0' to tip
> task_200909081006_0001_m_000002, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:09,306 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0001_m_000002_0' has completed
> task_200909081006_0001_m_000002 successfully.
> 2009-09-08 10:13:09,310 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0001_m_000000_0' to tip
> task_200909081006_0001_m_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:09,312 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_200909081006_0001_m_000000
> 2009-09-08 10:13:15,350 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0001_m_000000_0' has completed
> task_200909081006_0001_m_000000 successfully.
> 2009-09-08 10:13:15,353 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0001_m_000001_0' to tip
> task_200909081006_0001_m_000001, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:18,366 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0001_m_000001_0' has completed
> task_200909081006_0001_m_000001 successfully.
> 2009-09-08 10:13:18,367 INFO org.apache.hadoop.mapred.JobInProgress:
> Job job_200909081006_0001 has completed successfully.
> 2009-09-08 10:13:18,442 INFO org.apache.hadoop.mapred.JobHistory:
> Recovered job history filename for job job_200909081006_0001 is
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144
> .jar
> 2009-09-08 10:13:18,544 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0001_m_000000_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:18,544 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0001_m_000001_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:18,544 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0001_m_000002_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:27,843 INFO
> org.apache.hadoop.mapred.EagerTaskInitializationListener:
> Initializing job_200909081006_0002
> 2009-09-08 10:13:27,845 INFO org.apache.hadoop.mapred.JobHistory:
> Nothing to recover! Generating a new filename domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360
> .jar for job job_200909081006_0002
> 2009-09-08 10:13:27,846 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360
> .jar doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360
> .jar.recover for recovery.
> 2009-09-08 10:13:27,920 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360
> .jar doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360
> .jar.recover as the master history file for user.
> 2009-09-08 10:13:28,481 INFO org.apache.hadoop.mapred.JobInProgress:
> Input size for job job_200909081006_0002 = 151
> 2009-09-08 10:13:28,481 INFO org.apache.hadoop.mapred.JobInProgress:
> Split info for job:job_200909081006_0002 with 1 splits:
> 2009-09-08 10:13:28,482 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_200909081006_0002_m_000000 has split on node:/default-rack/
> localhost
> 2009-09-08 10:13:30,555 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0002_m_000002_0' to tip
> task_200909081006_0002_m_000002, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:33,562 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0002_m_000002_0' has completed
> task_200909081006_0002_m_000002 successfully.
> 2009-09-08 10:13:33,563 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0002_m_000000_0' to tip
> task_200909081006_0002_m_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:33,563 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_200909081006_0002_m_000000
> 2009-09-08 10:13:36,604 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0002_m_000000_0' has completed
> task_200909081006_0002_m_000000 successfully.
> 2009-09-08 10:13:36,605 INFO
> org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:1
> completedMapsInputSize:152 completedMapsOutputSize:409
> 2009-09-08 10:13:36,610 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0002_r_000000_0' to tip
> task_200909081006_0002_r_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:48,631 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0002_r_000000_0' has completed
> task_200909081006_0002_r_000000 successfully.
> 2009-09-08 10:13:48,634 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0002_m_000001_0' to tip
> task_200909081006_0002_m_000001, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,638 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0002_m_000001_0' has completed
> task_200909081006_0002_m_000001 successfully.
> 2009-09-08 10:13:51,638 INFO org.apache.hadoop.mapred.JobInProgress:
> Job job_200909081006_0002 has completed successfully.
> 2009-09-08 10:13:51,754 INFO org.apache.hadoop.mapred.JobHistory:
> Recovered job history filename for job job_200909081006_0002 is
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360
> .jar
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0002_m_000000_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0002_m_000001_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0002_m_000002_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0002_r_000000_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:58,252 INFO
> org.apache.hadoop.mapred.EagerTaskInitializationListener:
> Initializing job_200909081006_0003
> 2009-09-08 10:13:58,255 INFO org.apache.hadoop.mapred.JobHistory:
> Nothing to recover! Generating a new filename domU-12-31-39-07-50-
> C2
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin
> %3ADefaultJobName for job job_200909081006_0003
> 2009-09-08 10:13:58,258 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin
> %3ADefaultJobName doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin
> %3ADefaultJobName.recover for recovery.
> 2009-09-08 10:13:58,337 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin
> %3ADefaultJobName doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin
> %3ADefaultJobName.recover as the master history file for user.
> 2009-09-08 10:13:58,890 INFO org.apache.hadoop.mapred.JobInProgress:
> Input size for job job_200909081006_0003 = 151
> 2009-09-08 10:13:58,890 INFO org.apache.hadoop.mapred.JobInProgress:
> Split info for job:job_200909081006_0003 with 1 splits:
> 2009-09-08 10:13:58,890 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_200909081006_0003_m_000000 has split on node:/default-rack/
> localhost
> 2009-09-08 10:14:00,850 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0003_m_000002_0' to tip
> task_200909081006_0003_m_000002, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:03,854 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0003_m_000002_0' has completed
> task_200909081006_0003_m_000002 successfully.
> 2009-09-08 10:14:03,855 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0003_m_000000_0' to tip
> task_200909081006_0003_m_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:03,855 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_200909081006_0003_m_000000
> 2009-09-08 10:14:06,860 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0003_m_000000_0' has completed
> task_200909081006_0003_m_000000 successfully.
> 2009-09-08 10:14:06,860 INFO
> org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:1
> completedMapsInputSize:152 completedMapsOutputSize:409
> 2009-09-08 10:14:06,862 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0003_r_000000_0' to tip
> task_200909081006_0003_r_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:21,888 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0003_r_000000_0' has completed
> task_200909081006_0003_r_000000 successfully.
> 2009-09-08 10:14:21,889 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0003_m_000001_0' to tip
> task_200909081006_0003_m_000001, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:24,892 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0003_m_000001_0' has completed
> task_200909081006_0003_m_000001 successfully.
> 2009-09-08 10:14:24,893 INFO org.apache.hadoop.mapred.JobInProgress:
> Job job_200909081006_0003 has completed successfully.
> 2009-09-08 10:14:24,979 INFO org.apache.hadoop.mapred.JobHistory:
> Recovered job history filename for job job_200909081006_0003 is
> domU-12-31-39-07-50-
> C2
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin
> %3ADefaultJobName
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0003_m_000000_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0003_m_000001_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0003_m_000002_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0003_r_000000_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:13,466 INFO
> org.apache.hadoop.mapred.EagerTaskInitializationListener:
> Initializing job_200909081006_0004
> 2009-09-08 10:15:13,470 INFO org.apache.hadoop.mapred.JobHistory:
> Nothing to recover! Generating a new filename domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345
> .jar for job job_200909081006_0004
> 2009-09-08 10:15:13,471 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345
> .jar doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345
> .jar.recover for recovery.
> 2009-09-08 10:15:13,587 INFO org.apache.hadoop.mapred.JobHistory:
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345
> .jar doesnt exist! Using domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345
> .jar.recover as the master history file for user.
> 2009-09-08 10:15:14,203 INFO org.apache.hadoop.mapred.JobInProgress:
> Input size for job job_200909081006_0004 = 260104276
> 2009-09-08 10:15:14,203 INFO org.apache.hadoop.mapred.JobInProgress:
> Split info for job:job_200909081006_0004 with 1 splits:
> 2009-09-08 10:15:14,203 INFO org.apache.hadoop.mapred.JobInProgress:
> tip:task_200909081006_0004_m_000000 has split on node:/default-rack/
> localhost
> 2009-09-08 10:15:16,080 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0004_m_000002_0' to tip
> task_200909081006_0004_m_000002, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:19,096 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0004_m_000002_0' has completed
> task_200909081006_0004_m_000002 successfully.
> 2009-09-08 10:15:19,097 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0004_m_000000_0' to tip
> task_200909081006_0004_m_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:19,097 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:15:43,146 INFO
> org.apache.hadoop.mapred.TaskInProgress: Error from
> attempt_200909081006_0004_m_000000_0: java.lang.OutOfMemoryError:
> Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2760)
> at java.util.Arrays.copyOf(Arrays.java:2734)
> at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> at java.util.ArrayList.add(ArrayList.java:351)
> at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> at
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:
> 104)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:162)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:138)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:15:46,151 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0004_m_000000_1' to tip
> task_200909081006_0004_m_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:46,151 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:15:46,151 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:10,341 INFO
> org.apache.hadoop.mapred.TaskInProgress: Error from
> attempt_200909081006_0004_m_000000_1: java.lang.OutOfMemoryError:
> Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2760)
> at java.util.Arrays.copyOf(Arrays.java:2734)
> at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> at java.util.ArrayList.add(ArrayList.java:351)
> at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> at
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:
> 104)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:162)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:138)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:16:13,345 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0004_m_000000_2' to tip
> task_200909081006_0004_m_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:13,345 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:16:13,345 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_1' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:37,468 INFO
> org.apache.hadoop.mapred.TaskInProgress: Error from
> attempt_200909081006_0004_m_000000_2: java.lang.OutOfMemoryError:
> Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2760)
> at java.util.Arrays.copyOf(Arrays.java:2734)
> at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> at java.util.ArrayList.add(ArrayList.java:351)
> at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> at
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:
> 104)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:162)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:138)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:16:40,472 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0004_m_000000_3' to tip
> task_200909081006_0004_m_000000, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:40,472 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:16:40,472 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_2' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:04,550 INFO
> org.apache.hadoop.mapred.TaskInProgress: Error from
> attempt_200909081006_0004_m_000000_3: java.lang.OutOfMemoryError:
> Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2760)
> at java.util.Arrays.copyOf(Arrays.java:2734)
> at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> at java.util.ArrayList.add(ArrayList.java:351)
> at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> at
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java:
> 104)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:162)
> at
> org
> .apache
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper
> $1.next(SliceWrapper.java:138)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> at org.apache.hadoop.mapred.MapTask
> $TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:17:07,554 INFO
> org.apache.hadoop.mapred.TaskInProgress: TaskInProgress
> task_200909081006_0004_m_000000 has failed 4 times.
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobInProgress:
> TaskTracker at 'domU-12-31-39-07-50-C2.compute-1.internal' turned
> 'flaky'
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobInProgress:
> Aborting job job_200909081006_0004
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobInProgress:
> Killing job 'job_200909081006_0004'
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobTracker:
> Adding task 'attempt_200909081006_0004_m_000001_0' to tip
> task_200909081006_0004_m_000001, for tracker
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:07,556 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_3' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,561 INFO org.apache.hadoop.mapred.JobInProgress:
> Task 'attempt_200909081006_0004_m_000001_0' has completed
> task_200909081006_0004_m_000001 successfully.
> 2009-09-08 10:17:10,606 INFO org.apache.hadoop.mapred.JobHistory:
> Recovered job history filename for job job_200909081006_0004 is
> domU-12-31-39-07-50-
> C2
> .compute
> -1
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345
> .jar
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_1' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_2' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000000_3' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000001_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:
> Removed completed task 'attempt_200909081006_0004_m_000002_0' from
> 'tracker_domU-12-31-39-07-50-
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> [root@domU-12-31-39-07-50-C2 ~]#
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/mnt/hadoop</value>
> </property>
>
> <property>
> <name>fs.default.name</name>
> <value>hdfs://domU-12-31-39-07-50-C2.compute-1.internal:50001</value>
> </property>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>hdfs://domU-12-31-39-07-50-C2.compute-1.internal:50002</value>
> </property>
>
> <property>
> <name>tasktracker.http.threads</name>
> <value>80</value>
> </property>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>1</value>
> </property>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>1</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>3</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>3</value>
> </property>
>
> <property>
> <name>mapred.output.compress</name>
> <value>true</value>
> </property>
>
> <property>
> <name>mapred.output.compression.type</name>
> <value>BLOCK</value>
> </property>
>
> <property>
> <name>dfs.client.block.write.retries</name>
> <value>3</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx550m</value>
> </property>
>
> <property>
> <name>io.compression.codecs</name>
>
> <
> value
> >
> org
> .apache
> .hadoop
> .io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec</
> value>
> <description>A list of the compression codec classes that can be
> used for compression/decompression.</description>
> </property>
>
> <property>
> <name>fs.s3.awsAccessKeyId</name>
> <value>xxxx</value>
> </property>
> <property>
> <name>fs.s3.awsSecretAccessKey</name>
> <value>xxxx</value>
> </property>
> <property>
> <name>fs.s3n.awsAccessKeyId</name>
> <value>xxxx</value>
> </property>
> <property>
> <name>fs.s3n.awsSecretAccessKey</name>
> <value>xxxx</value>
> </property>
>
> </configuration>
>